CN117075684B

CN117075684B - Self-adaptive clock gridding calibration method for Chiplet chip

Info

Publication number: CN117075684B
Application number: CN202311331000.5A
Authority: CN
Inventors: 王嘉诚; 张少仲
Original assignee: Zhongcheng Hualong Computer Technology Co Ltd
Current assignee: Zhongcheng Hualong Computer Technology Co Ltd
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2023-12-19
Anticipated expiration: 2043-10-16
Also published as: CN117075684A

Abstract

The invention discloses a self-adaptive clock gridding calibration method of a Chiplet chip, which belongs to the technical field of integrated circuits and comprises the following steps: when the chip switches the execution task types, configuring a clock calibration global module, a plurality of clock calibration sub-modules and a clock grid; collecting workload and clock skew data for all chiplets during a first clock calibration period; predicting the workload and clock skew of each Chiplet; adjusting the position and the size of a clock grid in a chip; selecting a Chiplet deployment clock calibration sub-module in each grid; in a second clock calibration period, the clock calibration sub-module calibrates the clocks of the chiplets in the grid based on the indication of the clock calibration global module. The method can adapt to dynamic working load conditions, and improve the performance and stability of the multi-chip system.

Description

Self-adaptive clock gridding calibration method for Chiplet chip

Technical Field

The invention belongs to the technical field of integrated circuits, and particularly relates to a self-adaptive clock gridding calibration method of a Chiplet chip.

Background

Chip design based on Chiplet has become an important system integration method with the development of integrated circuit technology. In this design, multiple different functional chiplets are integrated into a larger system to achieve performance and power consumption optimization. However, how to effectively calibrate the clock of each Chiplet becomes very important. In a multiple Chiplet system, each Chiplet may have its own unique clock skew. These deviations may be caused by a number of factors. These clock skews may result in reduced system performance and may even result in system stability problems if not managed effectively.

Conventional clock calibration methods typically rely on a global clock source, but such methods may not be ideal in multiple Chiplet systems. Since the number of chiplets may be very large, the global clock source may not be able to effectively calibrate the clock skew of each Chiplet. Furthermore, clock skew may change frequently due to dynamic changes in workload and environmental conditions, which makes clock calibration more complex and difficult.

Therefore, how to perform efficient clock calibration in a multiple Chiplet system, especially under dynamic workload and environmental conditions, is an important issue in today's integrated circuit design. A new clock calibration method is needed that can accurately predict and calibrate the clock skew of each Chiplet to improve system performance and stability.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a self-adaptive clock meshing calibration method of a Chiplet chip, which comprises the following steps:

step 1, configuring a clock calibration global module when the chip switches and executes task types, and configuring a plurality of clock calibration sub-modules and clock grids corresponding to each clock calibration sub-module according to the task types;

Step 2, collecting the workload and clock skew data of all chiplets in the chip in a first clock calibration period;

step 3, predicting the workload and clock deflection of each Chiplet in the chip based on a prediction model according to the type of the task executed by the Chiplet chip and the collected workload and clock deflection data of all the Chiplets in the chip;

step 4, according to the predicted workload and clock deflection of each Chiplet in the chip, adjusting the position and the size of clock grids in the chip, wherein each clock grid comprises a Chiplet group, and the Chiplet group comprises at least one Chiplet;

step 5, selecting a Chiplet deployment clock calibration sub-module from at least one Chiplet in each grid;

and 6, in a second clock calibration period, each clock calibration submodule carries out clock calibration of each Chiplet in the grid according to the indication received by the clock calibration submodule from the clock calibration global module, wherein the second clock calibration period is smaller than the first clock calibration period.

The method comprises the steps of configuring a clock calibration global module, setting the number of clock calibration sub-modules managed by the clock calibration global module, initializing global clock distribution and clock deflection, and setting information reporting and collecting modes of the clock calibration global module and the clock calibration sub-modules;

The configuration clock calibration submodule comprises setting the position and the size of a clock grid where the clock calibration submodule is positioned, initializing clock distribution and clock deflection, and setting an information reporting and collecting mode of the clock calibration submodule and the clock calibration global module.

And predicting the workload and clock skew of each chip based on a prediction model according to the type of the task executed by the chip and the collected workload and clock skew data of all the chips in the chip, wherein the predicting the workload and clock skew of each chip based on a nonlinear regression model.

And setting different Gaussian process models according to the task types executed by the Chiplet chip, and predicting the workload of each Chiplet in the chip by using the Gaussian process based on the task types.

For the prediction of the clock skew of each Chiplet, the nonlinear model includes 6 input features, namely the current workload current_ workload, chiplet of the core frequency F, chiplet of Chiplet power consumption P, chiplet, the current clock skew and the combined correlation, and outputs the clock skew in the next clock calibration period;

determining next_clock_skew based on the following nonlinear relationship, including:

,/>,；

；

Wherein,，/>；

wherein,

is a vector containing six-dimensional input features;

are all parameter matrices of the model, said parameter matrices +.>Learning is performed by training the data set based on random gradient descent.

The method comprises the steps of adjusting the position and the size of clock grids in a chip according to the predicted workload and clock deflection of each Chiplet in the chip, wherein each clock grid comprises a Chiplet group, the group comprises a plurality of Chiplets, and the process of dividing the Chiplets in the chip to determine the clock grids comprises the following steps:

step a, defining and calculating two measurement parameters of similarity and distance between chiplets, wherein the two measurement parameters are defined according to the physical position of the chiplets, the predicted workload and the clock skew;

step b, clustering the Chiplets, including grouping the Chiplets into groups using a density-based DBSCAN algorithm after calculating the similarity and distance between the Chiplets.

Wherein a similarity function s (i, j) is defined to calculate the similarity of the chiplet_i and chiplet_j, the similarity function s (i, j) is calculated according to the predicted workload and the predicted clock skew of the Chiplet, and the predicted workload and the predicted clock skew are respectively the workload and the clock skew of each Chiplet in the next clock calibration period;

The calculation formula of the similarity function s (i, j) is as follows:

，

wherein,

workload (i) and workload (j) represent predicted workloads of chiplet_i and chiplet_j, respectively;

clock_skew (i) and clock_skew (j) represent predicted clock skew for chiplet_i and chiplet_j, respectively;

abs () represents an absolute value function;

exp () is a power-of-the-power function of the natural constant e;

and->Is a normalization factor obtained by counting the standard deviation of the workload and clock skew of all chiplets;

and defining a distance function d (i, j) to calculate wiring distances of chiplet_i and chiplet_j.

Wherein, in the step b, the Chiplet is grouped by using a DBSCAN algorithm based on density, which comprises the following steps:

step b.1, for each Chiplet, find out that the distance from it is smaller thanAnd the similarity is greater than +.>Is a chip of the chip;

step b.2, if the number of neighbors determined in the step b.1 of a Chiplet is greater than a preset minimum number of neighbors MinPts, the Chiplet is regarded as a core point;

step b.3, all chip connected to the core point density are divided into a group, if a plurality of core points are neighbors, the adjustmentAnd->And (c) repeating the step b until each group only comprises one core point.

In the step 5, in the grid composed of chiplets determined in each step 4, a Chiplet deployment clock calibration sub-module is selected from a plurality of chiplets, including the following procedures:

determining processing resources remaining at the expected load for each Chiplet according to the maximum load capacity and the expected load of the Chiplet;

after evaluating the residual processing resources of all the Chiplets, performing preliminary screening to remove the Chiplets with insufficient residual resources;

obtaining wiring distances from each Chiplet to all other Chiplets in the grid in the screened Chiplets, wherein the distances are wiring distances;

after the data of all wiring distances are obtained, a Chiplet with the smallest average wiring distance from other Chiplets in the clock grid is selected to deploy the clock calibration sub-module.

In the step 6, in a second clock calibration period, the clock calibration sub-modules perform clock calibration of the Chiplet in the grid according to the instruction received by the clock calibration sub-modules from the clock calibration global module, where the second clock calibration period is smaller than the first clock calibration period, and the method includes:

at the beginning of the second clock calibration period, the clock calibration global module sends instructions to each clock calibration sub-module, requiring them to start a new calibration period, the instructions including pre-calibration information;

The clock calibration global module performs pre-compensation on clock offset of each Chiplet according to the prediction data obtained in the step 3 in a first clock calibration period and a second clock calibration period in each first clock calibration period to determine pre-calibration information, and determines a clock management Chiplet corresponding to each Chiplet;

the clock calibration global module sends the clock offset compensation for each Chiplet to the corresponding clock management Chiplet.

According to the invention, the work load and clock deflection of each Chiplet can be predicted in advance by using the prediction model, so that clock calibration is performed in advance, and the stability of the system is improved. According to the type of the task executed by the Chiplet and the collected workload data, the position and the size of the clock grid in the chip are adjusted, so that the system can adapt to different workload conditions. Meanwhile, the clock calibration can be performed in a smaller range by the grid calibration method, so that the flexibility of the calibration is improved, and the clock calibration can be performed more easily when the system is expanded. And by selecting one Chiplet deployment clock calibration sub-module in each grid, each clock calibration sub-module only needs to be responsible for a small part of Chiplets, so that the resource utilization rate is improved, and the calibration complexity is reduced.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

fig. 1 is a flowchart illustrating an adaptive clock meshing calibration method for a Chiplet chip according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two.

It should be understood that although the terms first, second, third, etc. may be used to describe … … in embodiments of the present invention, these … … should not be limited to these terms. These terms are only used to distinguish … …. For example, the first … … may also be referred to as the second … …, and similarly the second … … may also be referred to as the first … …, without departing from the scope of embodiments of the present invention.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or device comprising such element.

In a Chiplet system, each Chiplet may have its own clock skew, affecting the performance and stability of the overall system. Such clock skew may become more complex and difficult to predict, especially under dynamic workload conditions. Conventional clock calibration methods may not be effective in this situation. The invention solves the problem of effectively managing the clock skew in a multi-chip system.

As shown in fig. 1, the invention discloses a self-adaptive clock gridding calibration method of a Chiplet chip, which comprises the following steps:

and step 1, configuring a clock calibration global module when the chip switches and executes task types, and configuring a plurality of clock calibration sub-modules and clock grids corresponding to each clock calibration sub-module according to the task types.

And 2, collecting the workload and clock skew data of all chiplets in the chip in a first clock calibration period.

And 3, predicting the workload and clock deflection of each chip according to the type of the task executed by the chip and the collected workload and clock deflection data of all the chips in the chip based on a prediction model.

And 4, adjusting the position and the size of clock grids in the chip according to the predicted workload and clock deflection of each Chiplet in the chip, wherein each clock grid comprises a Chiplet group, and the group comprises at least one Chiplet.

And 5, selecting one Chiplet deployment clock calibration submodule from at least one Chiplet in each grid.

In one embodiment, the clock calibration global module is used to coordinate and manage all clock calibration sub-modules within the Chiplet. The clock calibration global module performs global clock calibration according to the states of all the clock calibration sub-modules and their respective clock grids.

Each clock calibration sub-module is responsible for clock calibration of the clock grid in which it is located. The clock calibration submodule needs to feed back the workload, clock distribution and clock skew data of the clock calibration submodule to the clock calibration global module in real time.

In one embodiment, configuring the clock calibration global module includes setting the number of clock calibration sub-modules it manages, initializing the global clock distribution and clock skew (initialized to 0), and setting the information reporting and collecting modes of the clock calibration global module and the clock calibration sub-modules.

In one embodiment, the initial clock grid is established based on the physical layout of the Chiplet chip and the expected workload of the type of task being performed. If certain types of chiplets are expected to have higher workloads when performing certain task types, the clock grid they are on may need to be smaller in order to perform clock calibration more finely. As another example, if certain types of Chiplets are expected to have higher workloads, their clock frequencies are set higher. For example, the initial clock grid may be a uniform size grid equally divided inside the chip, but different task types of grids are different in size. Alternatively, the grid may be a single Chiplet grid, i.e. each Chiplet forms a grid at an initial stage, or the grids are divided according to the Chiplet type, so that Chiplets with the same computing function are divided into the same grid. While the determination of the clock calibration submodule may be randomly selected within the grid.

In one embodiment, different Gaussian process models are set according to task types (task_types) executed by the Chiplet chip, and the workload of each Chiplet in the chip is predicted by using a Gaussian process based on the task types. For example, training a gaussian process model a for task type a, and training gaussian process model B for task type B, and so on. The prediction performance is improved when different task types have different behavior patterns.

In one embodiment, during a first clock calibration period, workload and clock skew data is collected for all chiplets in the chip. Wherein the data collection includes the collection of workload and clock skew data for all chiplets within the chip. The first clock calibration period for data collection is determined based on the requirements and performance of the system. Data collection is accomplished through a hardware interface (e.g., reading the workload and clock information of the Chiplet through a bus).

In one embodiment, the predicting the workload and clock skew of each Chiplet in the chip based on the prediction model includes predicting the workload and clock skew of each Chiplet based on using a nonlinear regression model according to the type of task performed by the Chiplet and the collected workload and clock skew data of all chiplets in the chip.

Wherein the input contains [ current workload_workload ] for each Chiplet, the output is the workload in the next clock calibration period, and the workload prediction for each Chiplet in the chip is a unitary nonlinear regression model

Assume a current workload (current_workload). Based on the input parameters, the workload in the next clock calibration period is predicted. Input data is first normalized, normalizing the input parameters (i.e., the current workload) to be within the range of [0,1 ].

The gaussian process model is trained using a training dataset (including input parameters and corresponding workloads). During training, a kernel function RBF kernel function is selected and parameters of the kernel function (such as length scale and variance) are optimized to maximize the edge likelihood of the training data. The following formula is calculated during the training phase:. Wherein k (X_train ) is a kernel function matrix between training data,/H>Is a white noise term and I is an identity matrix.

For a new input parameter (i.e., a new current workload), predicting the new workload includes predicting using the following formula:

，

；

Wherein,

is the average of the predictions, i.e. the workload in the next clock calibration period.

Is the variance of the prediction, representing the uncertainty of the prediction.

Is a vector of kernel function values between the new input parameters and the training data.

Is the new input parameter's own kernel function value.

Y_train is an output value vector of training data.

In gaussian process regression, a kernel function is used to measure the similarity between data points.

k(X_train,X_train)、And->Are all calculated using kernel functions. Wherein:

is a vector whose element is the new input parameter +.>And a kernel function value for each element in the training data x_train. For example, if X_train has n elements, +.>Is an n-dimensional vector whose i-th element is +.>。

Is a scalar which is the new input parameter +.>A kernel function value of the user.

k (x_train ) is a matrix of nxn (where n is the number of training data) whose ith row and jth column elements are k (x_train [ i ], x_train [ j ]), i.e., kernel function values of the ith data point and the jth data point in the training data.

The kernel function is the radial basis function RBF, expressed as:。

wherein,is the square of the Euclidean distance of x and y, and l is the length scale parameter, controlling the width of the kernel function. The larger the length scale l, the larger the width of the kernel function, and the longer the long-distance dependence of the data can be captured.

In one embodiment, for the prediction of clock skew for each Chiplet, the direct parameters include [ Chiplet power consumption (P), chiplet core frequency (F), chiplet current workload (current_workload), chiplet current clock skew (current_clock_skew) ], the output is the clock skew in the next clock calibration period. There are 6 input features for each Chiplet, respectively Chiplet power consumption P, chiplet core frequency F, chiplet current workload current_ workload, chiplet current clock skew and combined correlation, and one output (clock skew in the next clock calibration period). Determining next_clock_skew based on the following nonlinear relationship, including:

,,/>；

；

wherein,，/>；

wherein,

is a vector containing six-dimensional input features;

In one embodiment, according to the predicted workload and clock skew of each Chiplet in the chip, the position and the size of the clock grid in the chip are adjusted, each clock grid comprises a Chiplet group, the group comprises a plurality of chiplets, and the process of dividing the chiplets in the chip to determine the clock grid comprises the following steps:

Step a, calculating the similarity and distance between Chiplets: the similarity and distance between two chiplets are defined. These two metrics can be calculated from the physical location of the Chiplet, the predicted workload, and the clock skew.

A similarity function s (i, j) is defined to calculate the similarity of chiplet_i and chiplet_j. The similarity function s (i, j) may be calculated from the predicted workload and clock skew of the Chiplet.

Optionally, a similarity function s (i, j) is defined to calculate the similarity of chiplet_i and chiplet_j, where the similarity function s (i, j) is calculated according to the predicted workload and predicted clock skew of Chiplet, which are the workload and clock skew of predicting each Chiplet in the next clock calibration period, respectively;

the calculation formula of the similarity function s (i, j) is as follows:

，

wherein,

abs () represents an absolute value function;

exp () is a power-of-the-power function of the natural constant e;

Optionally, a distance function d (i, j) is defined to calculate the wiring distances of chiplet_i and chiplet_j. The distance function d (i, j) is the wiring distance between the two chiplets.

Step b, clustering chiplets, including grouping chiplets using a density-based DBSCAN algorithm after calculating the similarity and distance between chiplets, grouping the chiplets with density-connected (i.e., distance less than a certain threshold and similarity greater than a certain threshold) into a group, comprising the following processes:

step b.1, for each Chiplet, find out that the distance from it is smaller than(a preset distance threshold) and the similarity is greater than +.>(a preset similarity threshold) all chiplets.

Step b.2, if the number of neighbors determined in step b.1 of a Chiplet is greater than MinPts (a preset minimum number of neighbors), the Chiplet is considered as a core point.

In one embodiment, the epsilon and theta values are adjusted so that there is only one core point in each group, and the adjustment is achieved by different parameter combinations, and when clusters in the clustering result are too compact, including that a plurality of core points are neighbors of each other, the adjustment is needed in steps And->For the clustering result to be too tight, +.>，/>. Wherein (1)>，。

In one embodiment, forAnd->When the clustering result dispersion includes that the noise points in the clustering result exceeds the preset quantity, the step adjustment is needed>And->Is a value of (2). If a Chiplet is neither a core point nor a boundary point, then the Chiplet is considered a noise point. Wherein, when the clustering result is dispersed, the method comprises the steps of (a)>，/>. Wherein (1)>，/>。

In one embodiment, a Chiplet is considered a boundary point if its number of neighbors does not reach MinPts, but if it is a neighbor of at least one core point. Boundary points are assigned to the group in which one of its core points is located. If a boundary point is a neighbor of multiple core points, it is selected to be assigned to the group of core points closest to it.

In one embodiment, for the grouping of noise points, the following rules are included:

the expected clock offset and the expected workload for each noise point determined in step 3 are obtained.

Noise points are divided into four categories according to expected clock skew and load conditions: high clock offset high load, high clock offset low load, low clock offset high load, low clock offset low load.

Wherein, if the clock offset and the load of a noise point are both higher than the set corresponding threshold value for the noise point determination, it is classified as high clock offset and high load. If the clock skew is above the threshold, but the load is below the threshold, it is classified as high clock skew low load. If the clock skew is below the threshold, but the load is above the threshold, the low clock skew high load is classified. If the clock offset and load of a noise point are both below the set corresponding thresholds for noise point decisions, they are classified as low clock offset low load.

And for noise points with high clock offset and high load and low clock offset, performing independent clock management, not dividing the noise points into the existing clock management grids, and disposing clock calibration sub-modules in chiplets corresponding to the noise points.

For low clock offset high load and low clock offset low load noise points, they are assigned to the closest group.

In one embodiment, in step 5, selecting a Chiplet deployment clock calibration sub-module from a plurality of Chiplets within the grid of Chiplets determined in each step 4, comprises the following steps:

the processing resources remaining for each Chiplet at the expected load are determined, including from the Chiplet's specification parameters (maximum load capacity) and the expected load.

After evaluating the remaining processing resources of all chiplets, a preliminary screening is performed to remove chiplets with insufficient resources.

And obtaining the wiring distance from each Chiplet to all other Chiplets in the grid in the screened Chiplet, wherein the distance is the wiring distance. After the data of all wiring distances are obtained, a Chiplet with the smallest average wiring distance from other Chiplets is selected to deploy the clock calibration sub-module.

In an embodiment, in the step 6, in a second clock calibration period, the clock calibration sub-modules perform clock calibration of the Chiplet in the grid according to the instruction received by the clock calibration sub-modules from the clock calibration global module, where the second clock calibration period is smaller than the first clock calibration period, and the method includes:

The method comprises the step of determining the clock management Chiplet corresponding to each Chiplet, wherein if the Chiplet is a deployed clock calibration sub-module, the Chiplet is the corresponding clock management Chiplet. If the clock calibration sub-module is not deployed in the Chiplet, the clock management Chiplet corresponding to the Chiplet is the Chiplet with the clock calibration sub-module deployed in the grid.

The clock calibration global module sends the clock offset compensation of each Chiplet to the corresponding clock management Chiplet, and the clock calibration global module adjusts the clock calibration strategy before the actual offset occurs so as to reduce the actual clock offset.

In a certain embodiment, the clock calibration global module collects monitoring data fed back by each clock calibration sub-module when a first clock calibration period and a second clock calibration period are finished in the first clock calibration period, predicts event offset of the next second clock calibration period according to feedback results after time offset adjustment of each chip in the chip, determines offset compensation of the second clock calibration period in the first clock calibration period, and then sends a new calibration strategy to start a new calibration period.

The clock calibration global module will send a new calibration strategy to each clock calibration sub-module. Each clock calibration sub-module calibrates according to a new strategy, starting a new second clock calibration period.

After calibration is completed, the current second clock calibration cycle is ended and the next is entered. The clock calibration global module collects monitoring results again, performs feedback adjustment and offset compensation, then sends a new calibration strategy, and starts a new calibration period.

This cycle is repeated for each second clock calibration period within each first clock calibration period.

In one embodiment, the data collected at the end of the last second clock calibration period in the first clock calibration period is the workload and clock skew data of all chiplets in the chip in the first clock calibration period collected in step 2 of the present invention. That is, the data collected at the end of the last second clock calibration period within the first clock calibration period includes the workload and clock skew data for all chiplets within the chip.

The data collected at the end of the second clock calibration period other than the last one in the first clock calibration period includes clock offset data of all chiplets in the chip at the last second clock calibration period that has ended.

In one embodiment, the clock calibration global module collects monitoring data fed back by the respective clock calibration sub-modules at the end of a first and a second clock calibration period within each first clock calibration period, including a current clock offset for each Chiplet.

The global predicts the clock offset of the next second clock calibration period based on past clock offset data of each Chiplet and monitoring data fed back from each clock calibration sub-module, and then determines offset compensation of the next period, including:

the clock calibration global module predicts a clock offset for each Chiplet at a next second clock calibration period based on the time series analysis model. Optionally, the time series analysis model is an ARIMA model.

The clock calibration global module determines offset compensation for the next cycle based on the predicted clock offset.

The clock calibration global module updates the clock calibration strategy based on the determined offset compensation and sends a new strategy to each clock calibration sub-module at the beginning of the next cycle.

In one embodiment, the clock calibration global module needs to collect and organize the clock offset data for each Chiplet. It is assumed that clock offset data for a chip over the past N second clock calibration cycles have been collected to form a time series.

The clock calibration global module selects the ARIMA model for prediction. The ARIMA model has three main parameters: p (auto regression term number), d (differential order), q (moving average term number). These parameters may be determined by model diagnostics and information criteria such as AIC or BIC. For example using the ARIMA model of the statsmodel library of Python, the optimal combination of parameters is selected by traversing different p, d, q combinations and comparing AIC values.

The clock calibration global module uses past clock offset data (data for the last N second clock calibration periods) to train the ARIMA model. The model training is complete and the clock calibration global module uses the trained model to predict the clock offset for the next second clock calibration period.

Optionally, a different ARIMA model is determined for each different type of Chiplet.

In one embodiment, the clock calibration global module may then calculate a clock offset compensation value based on the predicted clock offset to make the clock rate of each Chiplet within the chip as close as possible to the ideal clock rate so that the collaborative synchronization of the individual chiplets is better.

Optionally, the compensation value is a negative value of the predicted offset value. If the predicted offset value is +10ns, the offset value is-10 ns, so that the clock rate of the Chiplet will approach the ideal rate after the offset is applied.

The clock calibration global module may add the calculated clock offset compensation value to the clock adjustment instructions for each Chiplet.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The foregoing description of the preferred embodiments of the present invention has been presented for purposes of clarity and understanding, and is not intended to limit the invention to the particular embodiments disclosed, but is intended to cover all modifications, alternatives, and improvements within the spirit and scope of the invention as outlined by the appended claims.

Claims

1. An adaptive clock gridding calibration method of a Chiplet chip comprises the following steps:

2. The method for adaptively meshing calibration of a clock of a chip according to claim 1,

the configuration of the clock calibration global module comprises the steps of setting the number of clock calibration sub-modules managed by the clock calibration global module, initializing global clock distribution and clock deflection, and setting information reporting and collecting modes of the clock calibration global module and the clock calibration sub-modules;

3. The method for adaptively meshing calibration of a clock of a chip according to claim 1,

and predicting the workload and clock skew of each Chiplet in the chip based on a prediction model according to the type of the task executed by the Chiplet and the collected workload and clock skew data of all the Chiplets in the chip, wherein the prediction of the workload and clock skew of each Chiplet based on the nonlinear regression model is performed.

4. The method for adaptive clock meshing calibration of a chip according to claim 3,

5. The method for adaptive clock meshing calibration of a chip according to claim 3,

for the prediction of the clock skew of each Chiplet, the nonlinear model contains 6 input features, namely the current workload current_ workload, chiplet current clock skew of the core frequency F, chiplet of Chiplet power consumption P, chiplet and the combined correlation, and outputs the clock skew in the next clock calibration period;

,/>,；

；

wherein,，/>；

wherein,

is a vector containing six-dimensional input features;

6. A method of calibrating adaptive clock meshing of a chip according to any of claims 1-5, wherein the position and size of the clock meshing within the chip are adjusted based on the predicted workload and clock skew of each chip within the chip, each clock meshing comprising a group of chips, said group comprising a plurality of chips, said determining the clock meshing for the chip's Chiplet partitioning comprising:

7. The method for adaptively aligning a clock grid of a chip as in claim 6, wherein,

Defining a similarity function s (i, j) to calculate the similarity of the chiplet_i and chiplet_j, wherein the similarity function s (i, j) is calculated according to the predicted workload and the predicted clock skew of the Chiplet, and the predicted workload and the predicted clock skew are respectively the workload and the clock skew of each Chiplet in the next clock calibration period;

the calculation formula of the similarity function s (i, j) is as follows:

，

wherein,

abs () represents an absolute value function;

exp () is a power-of-the-power function of the natural constant e;

8. The adaptive clock meshing calibration method of a Chiplet according to claim 6, wherein said step b groups chiplets using a density-based DBSCAN algorithm, comprising:

step b.1, for each Chiplet, find out that the distance from it is smaller than And the similarity is greater than +.>Is a chip of the chip;

9. The adaptive clock meshing calibration method of a Chiplet chip according to claim 1, wherein in said step 5, a Chiplet deployment clock calibration sub-module is selected from a plurality of chiplets within a Chiplet-made grid determined in each step 4, comprising the steps of:

10. The adaptive clock meshing calibration method of a chip of claim 1, wherein in step 6, each clock calibration sub-module performs clock calibration of each chip in the mesh in a second clock calibration period, the second clock calibration period being smaller than the first clock calibration period, the method comprising: