WO2023213224A1 - 一种统一时钟频率的方法和装置 - Google Patents

一种统一时钟频率的方法和装置 Download PDF

Info

Publication number
WO2023213224A1
WO2023213224A1 PCT/CN2023/091264 CN2023091264W WO2023213224A1 WO 2023213224 A1 WO2023213224 A1 WO 2023213224A1 CN 2023091264 W CN2023091264 W CN 2023091264W WO 2023213224 A1 WO2023213224 A1 WO 2023213224A1
Authority
WO
WIPO (PCT)
Prior art keywords
chip
cache
rate
phase
locked loop
Prior art date
Application number
PCT/CN2023/091264
Other languages
English (en)
French (fr)
Inventor
郭健
梁传增
麦纪良
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023213224A1 publication Critical patent/WO2023213224A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1689Synchronisation and timing concerns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/161Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0002Serial port, e.g. RS232C
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0064Latency reduction in handling transfers

Definitions

  • the embodiments of the present application relate to the field of chip technology, and in particular, to a method and device for unifying a clock frequency.
  • serializer-deserializer SERializer and DESerializer, SerDes
  • DRAM Dynamic Random-access memory
  • main central processing unit Central Processing Unit, CPU
  • cache chip DRAM chip
  • SerDes serial interface between the main CPU chip and the cache chip
  • DIMM Dual-Inline-Memory-Modules
  • the clock clk_sds on the SerDes serial interface side in the cache chip and the clock clk_ddr of the DDR are usually asynchronous clocks, and the data flow needs to be in the transmit (transmit, TX) direction and the receive (receive, RX) direction respectively.
  • Asynchronous processing across clock domains usually the system bus clock clk_bus in the main CPU chip and the working clock clk_sds of the SerDes serial interface are also asynchronous clocks, and data also needs to be processed asynchronously in the TX direction and RX direction. In this way, a data flow needs to cross the asynchronous clock domain four times in the TX direction and RX direction.
  • a data flow needs to cross asynchronous clock domains six times, and the delay caused by the data flow crossing asynchronous clock domains is relatively large.
  • Embodiments of this application provide a method and device for unifying clock frequencies, which can unify the frequencies of SerDes and DRAM when using a serial interface to expand memory, so that the entire system works in a synchronous clock domain, reducing the number of times the data flow crosses asynchronous clocks, and ultimately Achieve the goal of low latency.
  • the first aspect provides a method of unifying the clock frequency, which is applied to a system that expands memory through a serial interface.
  • the system includes a main chip, a cache chip and at least one memory chip coupled with the cache chip, and a serial interface in the main chip.
  • the clocks of the coupled first phase-locked loop and the second phase-locked loop coupled to the serial interface in the cache chip are from the same source, and the dual in-line memory module interface in the cache chip and the serial interface in the cache chip are both connected to Second phase locked loop coupling,
  • the method includes:
  • the system controls the main chip and the cache chip to complete the serial interface initialization, it performs the first rate negotiation between the main chip and the cache chip to determine the first target rate at which the main chip accesses the cache chip; the system controls the main chip to access the cache chip at the first target rate.
  • the cache chip is initialized, the second target rate supported by the dual in-line memory module interface between the cache chip and the memory chip is determined; the system combines the clock frequency of the first phase-locked loop in the main chip with the clock frequency of the first phase-locked loop in the cache chip.
  • the clock frequencies of the two phase-locked loops are configured as clock frequencies corresponding to the second target rate, and the main chip and the cache chip are controlled to complete the second rate negotiation.
  • serial interface can be understood as SerDes
  • phase-locked loop can be a PLL
  • dual in-line memory module can be a DIMM
  • this application uses two rate negotiation methods to complete the rate negotiation to achieve the purpose of unifying the frequency points of DRAM and SerDes.
  • the main chip and the cache chip can first negotiate a low-frequency point rate, that is, the first target rate.
  • the main chip can initialize the cache chip and obtain information about the cache chip, that is, obtain the rate of the memory chip inserted on the cache chip.
  • the main chip obtains the information of the cache chip, it can determine the frequency point at which DRAM and SerDes work together, which is the frequency point corresponding to the highest rate of the DIMM interface when the memory chip is inserted into the slot.
  • the main chip initializes the frequency of the PLL in the main chip and the PLL in the cache chip to the frequency corresponding to the speed of the memory chip. Based on the frequency of the initialized PLL, the main chip and the cache chip conduct a second rate negotiation. When the negotiation is successful, the DRAM and SerDes negotiate a unified target rate, that is, the second target rate. After completing two rate negotiations, service transmission can begin between the main chip, cache chip and memory chip.
  • a common reference clock is used in the main chip and the cache chip, and the maximum speed of the main chip and the cache chip is related to the maximum speed supported by the memory chip of the DIMM interface on the cache chip.
  • the transmission rates of the serial interface SerDes of the main chip and cache chip will also change simultaneously. In this way, when data spans multiple chips, it still works in the synchronous clock domain, without the delay consumption of asynchronous processing, which can reduce the additional delay of the system in asynchronous clocks.
  • performing the first rate negotiation between the main chip and the cache chip to determine the first target rate at which the main chip accesses the cache chip includes: Control The first phase-locked loop in the main chip and the second phase-locked loop in the cache chip perform frequency multiplication processing based on the same source reference clock to obtain the operating clock frequency of the first phase-locked loop and the operating clock frequency of the second phase-locked loop.
  • the clock frequency of the first phase-locked loop is the same as the clock frequency of the second phase-locked loop; determine the transmission rate of the serial interface in the main chip according to the clock frequency of the first phase-locked loop, and determine the transmission rate of the serial interface in the main chip according to the clock frequency of the second phase-locked loop.
  • the clock frequency of the ring operation determines the transmission rate of the serial interface in the cache chip; the first rate transfer between the main chip and the cache chip is performed based on the transmission rate of the serial interface of the main chip and the transmission rate of the serial interface in the cache chip.
  • negotiation and when the negotiation is successful, determine the transmission rate of the serial interface in the main chip and the transmission rate of the serial interface in the cache chip.
  • the first target rate for the main chip to access the cache chip.
  • phase It is equivalent to determining the transmission rate of the main chip and the transmission rate of the cache chip, and the transmission rates are the same. In this way, when the main chip successfully negotiates the rate with the cache chip based on the determined transmission rate, it is equivalent to determining the first target rate at which the main chip can access the cache chip. In other words, a low-frequency rate has been negotiated between the main chip and the cache chip, and the main chip can access the cache chip through this low-frequency rate.
  • determining the second target rate supported by the dual in-line memory module interface between the cache chip and the memory chip includes: controlling the main chip The serial interface of the chip accesses the register of the cache chip at the first target rate to initialize the register of the cache chip; the cache chip is controlled to access the memory chip connected to the cache chip through the dual in-line memory module interface to determine the dual in-line memory chip.
  • the highest rate supported by the plug-in memory module interface is the highest frequency supported by the dual in-line memory module interface as the second target rate; the cache chip is controlled to record the second target rate in the register of the cache chip.
  • memory chips inserted on the block DIMM interface of the cache chip may support different access rates.
  • the cache chip can record the access rate supported by the memory chip currently inserted into the DIMM interface in the register of the cache chip. In this way, the main chip can determine the highest rate supported by the memory chip (the second target rate) from this register to adjust the clock frequency of the main chip and cache chip to a clock frequency that matches the highest rate, which will avoid crossing asynchronous clock domains. , reduce the delay of data transmission.
  • both the clock frequency of the first phase-locked loop in the main chip and the clock frequency of the second phase-locked loop in the cache chip are configured to a clock frequency corresponding to the second target rate, and the main chip is controlled.
  • the completion of the second rate negotiation between the chip and the cache chip includes: controlling the main chip to read the second target rate from the register of the cache chip; controlling the main chip to initialize the clock frequency of the first phase-locked loop and the second phase-locked loop according to the second target rate.
  • the clock frequency of the loop is such that the clock frequency of the first phase-locked loop and the clock frequency of the second phase-locked loop are clock frequencies corresponding to the second target rate; the serial interface of the main chip and the serial interface of the cache chip are controlled to the second target rate.
  • the second target rate is used for the second rate negotiation.
  • the SerDes in the main chip and the cache chip, and the DRAM between the cache chip and the memory chip will work at a unified clock frequency (frequency point/second target rate), and the first PLL and the second PLL are Based on the same source clock, and the clock and serial interface in the cache chip are also the same source clock, when the main chip accesses the memory chip through the cache chip, it will avoid crossing the asynchronous clock domain and reduce the delay of data transmission.
  • the serial interface of the main chip and the serial interface of the cache chip support any specified integer rate when negotiating rates. That is to say, when the serial interface of the main chip and the cache chip performs rate negotiation, it will not limit the negotiation to only support certain fixed frequencies (such as PCIE's 8Gbps, 16Gpbs, etc.).
  • the two rate negotiation processes in this application can Supports negotiation of any specified integer rate.
  • a second aspect provides a system for expanding memory through a serial interface.
  • the system includes a main chip, a cache chip and at least one memory chip coupled to the cache chip.
  • a first phase-locked loop coupled to the serial interface in the main chip and The clock of the second phase-locked loop coupled to the serial interface in the cache chip has the same source.
  • the dual in-line memory module interface in the cache chip and the serial interface in the cache chip are both coupled to the second phase-locked loop, where :
  • the main chip is used to complete the first rate negotiation with the cache chip after completing the serial interface initialization with the cache chip to determine the first target rate at which the main chip accesses the cache chip; after initializing the cache chip at the first target rate, Determine the second target rate supported by the dual in-line memory module interface between the cache chip and the memory chip; configure the clock frequency of the first phase-locked loop in the main chip to a clock frequency corresponding to the second target rate; the cache chip , used to configure the clock frequency of the second phase-locked loop in the cache chip to correspond to the second target rate. Clock frequency; the main chip is also used to complete the second rate negotiation with the cache chip.
  • the main chip is used to control the first phase-locked loop L to perform frequency multiplication processing according to a reference clock from the same source to obtain the clock frequency at which the first phase-locked loop operates;
  • the cache chip is used to control the second phase-locked loop L.
  • the phase-locked loop performs frequency multiplication processing based on the reference clock from the same source to obtain the clock frequency of the second phase-locked loop.
  • the clock frequency of the first phase-locked loop is the same as the clock frequency of the second phase-locked loop.
  • the main chip uses Determine the transmission rate of the serial interface in the main chip according to the clock frequency of the first phase-locked loop operation; the cache chip is used to determine the transmission rate of the serial interface in the cache chip according to the clock frequency of the second phase-locked loop operation; The main chip is used to perform the first rate negotiation between the main chip and the cache chip based on the transmission rate of the serial interface of the main chip and the transmission rate of the serial interface in the cache chip; when the negotiation is successful, the serial number in the main chip is determined.
  • the transmission rate of the line interface is the first target rate for the main chip to access the cache chip; the cache chip is used to determine the transmission rate of the serial interface in the cache chip when the negotiation is successful.
  • the main chip is used to control the serial interface of the main chip to access the register of the cache chip at a first target rate to initialize the register of the cache chip; the cache chip is used to use dual in-line insertion
  • the dual-in-line memory module interface accesses the memory chip connected to the cache chip, determines the highest rate supported by the dual-in-line memory module interface, and uses the highest frequency supported by the dual-in-line memory module interface as the second target rate; set the second target The rate is recorded in a register on the cache chip.
  • the main chip is used to read the second target rate from the register of the cache chip; initialize the clock frequency of the first phase-locked loop and the clock frequency of the second phase-locked loop according to the second target rate, Make the clock frequency of the first phase-locked loop and the clock frequency of the second phase-locked loop be the clock frequency corresponding to the second target rate; control the serial interface of the main chip and the serial interface of the cache chip to perform the third operation at the second target rate. Secondary rate negotiation.
  • the serial interface of the main chip and the serial interface of the cache chip support any specified integer rate when negotiating rates.
  • a frequency control device is provided.
  • the frequency control device is applied to a system that expands memory through a serial interface.
  • the system includes a main chip, a cache chip and at least one memory chip coupled to the cache chip, and the serial interface in the main chip.
  • the clocks of the first phase-locked loop coupled with the interface and the second phase-locked loop coupled with the serial interface in the cache chip have the same source.
  • the dual in-line memory module interface in the cache chip and the serial interface in the cache chip both have the same clock source. Coupled with the second phase locked loop, the frequency control device includes:
  • the rate negotiation unit is used to control the main chip and the cache chip to complete the serial interface initialization and control the main chip and the cache chip to perform the first rate negotiation between the main chip and the cache chip to determine the first target for the main chip to access the cache chip.
  • rate the rate acquisition unit is used to control the main chip to initialize the cache chip at the first target rate and determine the second target rate supported by the DIMM interface between the cache chip and the memory chip;
  • the rate negotiation unit is also used to transfer the host chip to the first target rate.
  • the clock frequency of the first PLL in the chip and the clock frequency of the second PLL in the cache chip are both configured to a clock frequency corresponding to the second target rate, and the main chip and the cache chip are controlled to complete the second rate negotiation.
  • the rate negotiation unit is used to: control the first phase-locked loop in the main chip and the second phase-locked loop in the cache chip to perform frequency multiplication processing based on the same source reference clock to obtain the first phase-locked loop.
  • the clock frequency of the loop operation and the clock frequency of the second phase-locked loop operation, the clock frequency of the first phase-locked loop operation and the working time of the second phase-locked loop The clock frequency is the same;
  • the transmission rate of the serial interface in the main chip is determined according to the clock frequency of the first phase-locked loop, and the transmission rate of the serial interface in the cache chip is determined according to the clock frequency of the second phase-locked loop; according to The transmission rate of the serial interface of the main chip and the transmission rate of the serial interface in the cache chip perform the first rate negotiation between the main chip and the cache chip, and determine the transmission rate of the serial interface in the main chip when the negotiation is successful.
  • the transmission rate of the serial interface in the cache chip is the first target rate for the main chip to access the cache chip.
  • the rate acquisition unit is used to: control the serial interface of the main chip to access the register of the cache chip at a first target rate to initialize the register of the cache chip; control the cache chip to access the register of the cache chip through a dual in-line
  • the memory module interface accesses the memory chip connected to the cache chip, determines the highest rate supported by the dual in-line memory module interface, and uses the highest frequency supported by the dual in-line memory module interface as the second target rate; the cache chip is controlled to The target rate is recorded in a register on the cache chip.
  • the rate negotiation unit is used to: control the main chip to read the second target rate from the register of the cache chip; control the main chip to initialize the clock frequency of the first phase-locked loop and the second phase-locked loop according to the second target rate.
  • the clock frequency of the phase-locked loop is such that the clock frequency of the first phase-locked loop and the clock frequency of the second phase-locked loop are clock frequencies corresponding to the second target rate; controlling the serial interface of the main chip and the serial interface of the cache chip Conduct the second rate negotiation at the second target rate.
  • the serial interface of the main chip and the serial interface of the cache chip support any specified integer rate when negotiating rates.
  • a fourth aspect provides a frequency control device, including at least one processor.
  • the at least one processor is connected to a memory.
  • the at least one processor is used to read and execute programs stored in the memory, so that the device executes the above-mentioned first aspect. Or the method described in any one of the first aspects.
  • a frequency control device in a fifth aspect, includes a main chip, a cache chip and a memory chip.
  • the main chip is coupled to a memory and is used to read and execute program instructions stored in the memory to implement the above-mentioned first step. The method described in one aspect or any one of the first aspects.
  • a sixth aspect provides a frequency control device, which is included in an electronic device and has the function of realizing the behavior of the electronic device in any of the above aspects and any possible implementation manner.
  • This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software.
  • Hardware or software includes one or more modules or units corresponding to the above functions. For example, rate negotiation module or unit and rate acquisition module or unit, etc.
  • a computer-readable storage medium including computer instructions.
  • the electronic device causes the electronic device to execute the method described in the above-mentioned first aspect and any possible design of the first aspect. .
  • a computer program product is provided.
  • the computer program product When the computer program product is run on a computer or processor, it causes the computer or processor to execute the method in the above first aspect and any possible implementation manner.
  • any frequency control device, computer-readable storage medium or computer program product provided above can be applied to the corresponding method provided above. Therefore, the beneficial effects it can achieve can be referred to the corresponding method. The beneficial effects of this method will not be repeated here.
  • Figure 1 is a schematic diagram of a solution for using SerDes to expand DRAM provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of a serial interface extended DRAM system networking provided by an embodiment of the present application
  • Figure 3 is a schematic diagram of a system networking structure provided by an embodiment of the present application.
  • Figure 4 is a schematic flowchart of a method for unifying clock frequencies provided by an embodiment of the present application
  • Figure 5 is a schematic flowchart of a method for unifying clock frequencies provided by an embodiment of the present application.
  • Figure 6 is a schematic structural diagram of a system for expanding memory through a serial interface provided by an embodiment of the present application
  • Figure 7 is a schematic structural diagram of a system for expanding memory through a serial interface provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a system for serial interface expansion memory provided by an embodiment of the present application.
  • DRAM Dynamic Random Access Memory, the most common system memory. DRAM can hold data for a short period of time. In order to retain data, DRAM uses capacitor storage and needs to be refreshed every once in a while. If the storage unit in DRAM is not refreshed, the stored information will be lost. For example, data will be lost when the machine is shut down.
  • DDR Double Data Rate.
  • the current DDR1-DDR5 are generations of memory. Memory of different generations has different transmission rates.
  • DDR1 represents a generation.
  • the transmission rate of DDR2 is twice that of DDR1
  • the transmission rate of DDR5 is twice that of DDR3.
  • DIMM It can be understood as a memory stick inserted into a slot on DRAM, which can provide a 64-bit data channel.
  • SerDes It is a mainstream time division multiplexing and point-to-point serial communication technology. That is, multiple low-speed parallel signals are converted into high-speed serial signals at the transmitting end, and after passing through the transmission medium, the high-speed serial signals are converted back into low-speed parallel signals at the receiving end.
  • This point-to-point serial communication technology makes full use of the channel capacity of the transmission medium, reduces the number of required transmission channels and device pins, and increases the signal transmission speed, thus greatly reducing communication costs.
  • Asynchronous clock When the phases between two clocks are fixed, the two clocks can be called synchronous clocks. Generally of the same source, such as two clocks generated by the same mixed mode clock manager (Mixed Mode Clock Manager, MMCM) or phase locked loop (Phase Locked Loop, PLL), can be called synchronous clocks. Therefore, the master clock and its corresponding derivative clock can be constrained to the same clock group. When the phase between two clocks cannot be determined, the two clocks can be called asynchronous clocks. Two clocks from different crystal oscillators must be asynchronous clocks. Normally, different master clocks in the design must be asynchronous clocks, so the two master clocks and their derivatives can be constrained into different clock groups.
  • MMCM Mixed Mode Clock Manager
  • PLL Phase Locked Loop
  • PLL Uses an externally input reference signal to control the frequency and phase of the oscillation signal inside the loop. Since PLL can automatically track the output signal frequency to the input signal frequency, PLL is usually used in closed-loop tracking circuits.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include one or more of these features. In the description of this embodiment, unless otherwise specified, “plurality” means two or more.
  • Standard DDR modules usually adopt a 64-bit structure and can transmit 64-bit binary data at a time, corresponding to a 64-bit parallel memory bus.
  • parallel buses can easily interfere with each other, making the transmission signal unstable, and it is difficult to increase the frequency quickly.
  • the slow and step-by-step improvement of memory specifications is not only due to market consumption considerations, but also due to technical realities.
  • the parallel data sent by the memory module needs to arrive at the receiving end synchronously in the same transmission beat, which requires that the length of the 64 lines in the Printed Circuit Board (PCB) be strictly consistent, which puts forward some suggestions for PCB design. Harsh requirements.
  • OMI Open Memory Interface
  • chip 1 is the main chip (main CPU chip)
  • chip 2 is the cache chip
  • chip 3 is the memory chip. Multiple chips 3 can be coupled to one chip 2 to To achieve the purpose of expanding memory.
  • the interface between chip 1 and chip 2 is a SerDes serial interface
  • the interface between chip 2 and chip 3 is a DIMM parallel interface.
  • Chip 1 includes the bus clock clk_bus and the logic clock (clk_logic).
  • the reference clock of the working clock clk_sds1 of the PLL1 on the SerDes1 (serial interface 1) interface side is REF_CLK1.
  • the PLL2 on the SerDes2 (serial interface 2) interface side of chip 2 is The reference clock of the working clock clk_sds2 is REF_CLK2.
  • the PLL3 coupled to the DIMM interface in chip 2 triggers the working clock clk_ddr of the DDR physical interface (PHY) 2 and the reference clock of clk_sds2, which are both REF_CLK2.
  • the DIMM interface has transmission rates corresponding to various frequency points in the application scenario.
  • the transmission rates of DDR5 include 4.8GT/s, 5.200GT/s, 6.800GT/s, and 8.400GT/s.
  • chip 1 can determine the operating frequency of the DRAM. Only chip 1 can configure the working clock clk_ddr of PLL3 in chip 2 to trigger DDR PHY2 according to the working frequency of the DRAM.
  • current serial protocols usually work at fixed frequencies, such as the high-speed serial computer expansion bus standard (peripheral component interconnect). express, PCIE) protocol usually has multiple fixed frequency transmission rates, such as 2.5GT/s, 5GT/s, 8GT/s, 16GT/s and 32GT/s.
  • clk_sds2 and clk_ddr in chip 2 are usually asynchronous clocks, and the data flow needs to cross the clock domain in the TX direction and RX direction respectively, that is, the data flow shown in Figure 1 is
  • the TX direction indicated by 3 and the RX direction indicated by 4 need to be processed asynchronously.
  • the system bus clock clk_bus in chip 1 and the working clock clk_sds1 of the serial interface SerDes1 in chip 1 are also asynchronous clocks.
  • SerDes1 in chip 1 and SerDes2 in chip 2 share the reference clock, and the frequencies of clk_sds1 and clk_sds2 are unified, the data stream does not need to be processed asynchronously when passing through the TX direction indicated by 2 and the RX direction indicated by 5. . If SerDes1 in chip 1 and SerDes2 in chip 2 do not share the reference clock (PLL1 uses REF_CLK1 and PLL2 uses REF_CLK2), the data flow needs to be processed asynchronously when passing through the TX direction indicated by 2 and the RX direction indicated by 5.
  • this application provides a method and device for unifying the clock frequency, which can solve the problem of unifying the frequencies of DRAM and SerDes when using a serial interface to expand memory, so that the entire system works in a synchronous clock domain and reduces the number of times the data flow crosses asynchronous clocks. , and ultimately achieve the goal of low latency.
  • this application uses two rate negotiation methods to complete the rate negotiation to achieve the purpose of unifying the frequency points of DRAM and SerDes.
  • the main chip and the cache chip can first negotiate to a low-frequency point rate.
  • the main chip can initialize the cache chip and obtain the information of the cache chip, that is, obtain the information inserted into the cache chip.
  • the speed of the memory chip When the main chip obtains the information of the cache chip, it can determine the frequency point at which DRAM and SerDes work together, which is the frequency point corresponding to the highest rate of the DIMM interface when the memory chip is inserted into the slot.
  • the main chip initializes the frequency of the PLL in the main chip and the PLL in the cache chip to the frequency corresponding to the speed of the memory chip. Based on the frequency of the initialized PLL, the main chip and the cache chip conduct a second rate negotiation. When the negotiation is successful, the DRAM and SerDes negotiate a unified target rate. After completing two rate negotiations, service transmission can begin between the main chip, cache chip and memory chip.
  • a common reference clock is used in the main chip and the cache chip, and the maximum speed of the main chip and the cache chip is related to the maximum speed supported by the memory chip of the DIMM interface on the cache chip.
  • the transmission rates of the serial interface SerDes of the main chip and cache chip will also change simultaneously. In this way, when data spans multiple chips, it still works in the synchronous clock domain, without the delay consumption of asynchronous processing, which can reduce the additional delay of the system in asynchronous clocks.
  • the transmission rate and the transmission frequency (frequency point) usually, there is a certain multiple relationship between the transmission rate and the transmission frequency (frequency point).
  • the "quad pumped" four-times concurrency technology used in the Front Side Bus (FSB) has been improved.
  • the quad pumped transmits data four times in each bus clock cycle.
  • the data transfer rate of the bus is equal to 4 times the bus clock frequency.
  • the bus has a clock frequency of 333MHz
  • the data transfer rate of the bus is 1332MT/s, which is 1.332GT/s.
  • the method of unifying the clock frequency provided by this application can be applied to the serial interface to expand the DRAM system network.
  • the network structure can be shown in Figure 2, including three types of chips. Among them, chip 1 is the main chip, or main CPU chip; chip 2 is a cache chip; chip 3 is a DRAM memory chip, including chip 3_0, chip 3_1, chip 3_2 and chip 3_3 shown in Figure 2. Chip 1 is located on the motherboard, and chip 2 and chip 3 are located on the daughter board.
  • the networking shown in Figure 2 is a schematic form, and the chip 2 and the chip can also be directly installed on the motherboard.
  • the interface between chip 1 and chip 2 is a serial interface (SerDes), and the interface between chip 2 and chip 3 is a DIMM interface (DIMM parallel interface).
  • Chip 1 can expand more DIMM interfaces through chip 2, that is, the number of chips 3 is not limited to the four shown in Figure 2. This way of expanding DRAM through the serial interface can expand more DIMM interfaces in chip 2, thereby expanding the DRAM capacity.
  • the bandwidth of the serial interface is greater than the bandwidth of the directly connected DIMM interface, the number of chips 3 coupled to chip 2 can also be increased, so that chip 1 obtains a larger DRAM access bandwidth.
  • chip 2 serves as a cache chip, and its function is to expand the memory so that chip 1 can access more chip 3.
  • chip 1 may also have at least two serial interfaces, and one serial interface is coupled to one chip 2 to obtain larger DRAM capacity.
  • clk_bus is the bus clock in chip 1;
  • PLL1 is the phase-locked loop of the clock frequency of the serial interface SerDes1 in chip 1,
  • clk_sds1 is the clock of the serial interface SerDes1 in chip 1 to send and receive data;
  • PLL2 is the clock frequency of the serial interface SerDes1 in chip 1
  • the phase-locked loop of the clock frequency of the serial interface SerDes2 is the clock of the serial interface SerDes2 in chip 2 that sends and receives data;
  • clk_ddr is the clock of the DDR PHY2 in chip 2, or the clock of the DIMM interface in chip 2.
  • the serial interfaces SerDes1 and SerDes2 of chip 1 and chip 2 use a common reference clock REF_CLK.
  • This application can configure the clock source of PLL1 and PLL2 to the same reference clock REF_CLK through software, so that when PLL1 and PLL2 have the same source, SerDes1 in chip 1 and SerDes2 in chip 2 can work at the same frequency. , PLL1 and PLL2 can play the role of locking the same frequency point.
  • the clock configuration process may be implemented through chip 1 . Since PLL1 and PLL2 share the reference clock, the frequencies of clk_sds1 and clk_sds2 are the same, with only phase differences. Therefore, when the data output by SerDes1 of chip 1 is sampled on SerDes2 of chip 2, only the phase sampling needs to be adjusted, and no cross-clock domain asynchronous processing is required.
  • clk_sds2 and clk_ddr in chip 2 both come from PLL2.
  • clk_sds2 and clk_ddr have the same frequency relationship or a frequency multiplication relationship.
  • clk_sds2 and clk_ddr are in the same clock domain. Therefore, the digital logic of the controller of the serial interface (not shown in Figure 3) and the digital logic of the controller of the DDR are in a synchronous clock domain, and there is no need for cross-clock asynchronous processing.
  • the method of unifying the clock frequency in this application is introduced below. Refer to Figure 4.
  • This method is applied to a system that expands memory through a serial interface.
  • the system includes a main chip, a cache chip, and at least one memory chip coupled with the cache chip.
  • the clocks of the first PLL coupled to the serial interface in the main chip and the second PLL coupled to the serial interface in the cache chip have the same source. Both the DIMM interface in the cache chip and the serial interface in the cache chip are coupled to the second PLL.
  • the method includes:
  • the system controls the main chip and the cache chip to complete the serial interface initialization, it performs the first rate negotiation between the main chip and the cache chip to determine the first target rate at which the main chip accesses the cache chip.
  • the main chip in Figure 4 is equivalent to chip 1 in Figures 2 and 3.
  • the cache chip in Figure 4 is equivalent to chip 2 in Figures 2 and 3.
  • the memory chip in Figure 4 is equivalent to chip 2 in Figures 2 and 3.
  • the first PLL in FIG. 4 is equivalent to PLL1 in FIG. 3
  • the second PLL in FIG. 4 is equivalent to PLL2 in FIG. 3 .
  • the main chip can control the main chip and the cache chip to complete the initialization of the serial interface.
  • the main chip can control the first PLL and the second PLL according to the same source reference clock.
  • REF_CLK configures the clock frequency.
  • the clock frequency of the first PLL is the same as the clock frequency of the second PLL, so that the serial interface SerDes1 of the main chip and the serial interface SerDes2 of the cache chip have the same rate, both of which are the first target rate.
  • the main chip can control the serial interface SerDes1 of the main chip and the serial interface SerDes2 of the cache chip to start rate negotiation and perform the first rate negotiation. If the negotiation is successful, the main chip determines that the rate at which the main chip accesses the cache chip is the first target rate.
  • system control main chip After the system control main chip initializes the cache chip at the first target rate, it determines the second target rate supported by the dual in-line memory module interface between the cache chip and the memory chip.
  • the main chip can access the registers in the cache chip through the serial interfaces SerDes1 and SerDes2, and initialize the registers in the cache chip so that the registers in the cache chip can be accessed, including read operations, write operations, and rewrite operations.
  • the main chip can also initialize the DIMM interface in the cache chip, obtain the highest rate supported by the memory chip inserted in the DIMM interface of the cache chip, and record the highest rate in the register of the cache chip. The highest speed supported by the memory chip here is recorded as the second target speed.
  • the system configures the clock frequency of the first phase-locked loop in the main chip and the clock frequency of the second phase-locked loop in the cache chip to the clock frequency corresponding to the second target rate, and controls the main chip and the cache chip to complete Second rate negotiation.
  • the main chip can access the register of the cache chip to obtain the second target rate, configure the clock frequency of the first PLL and the second PLL to the clock frequency corresponding to the second target rate, and enable the serial number of the main chip according to the second target rate.
  • the serial interface SerDes1 and the serial interface SerDes2 of the cache chip conduct a second rate negotiation. After the negotiation is successful, the frequency unification process ends.
  • the SerDes core DRAM in the system works at a unified frequency.
  • this application can use the first PLL and the second PLL in the main chip to have the same source clock, and the clock of the DRAM is coupled on the basis of the second PLL, through two rates.
  • Negotiation can unify the clock frequencies of the first PLL, the second PLL and the DRAM.
  • This application provides a method for unifying the clock frequency, as shown in Figure 5.
  • the method includes:
  • the system determines that the main chip, cache chip and at least one memory chip are powered on.
  • the system when the system is in a terminal device, when the user performs a power-on operation on the terminal device, after the motherboard is powered on, the main chip, cache chip and at least one memory chip on the motherboard are also powered on.
  • the system control main chip and cache chip complete the serial interface initialization.
  • the main chip controls the first PLL in the main chip and the second PLL in the cache chip to perform frequency multiplication processing based on the same source reference clock to obtain the clock frequency of the first PLL and the clock of the second PLL.
  • Frequency, the clock frequency at which the first PLL operates is the same as the clock frequency at which the second PLL operates.
  • the transmission rate of the serial interface 1 (SerDes1) in the main chip is determined according to the clock frequency at which the first PLL operates, and the transmission rate of the serial interface 2 (SerDes2) in the cache chip is determined according to the clock frequency at which the second PLL operates.
  • the main chip includes a logic controller 1 and the cache chip includes a logic controller 2 .
  • the logic controller 1 can send a clock frequency configuration instruction to PLL1 in the serial interface 1, and PLL1 can perform frequency multiplication processing based on the base clock frequency of the reference clock REF_CLK, obtaining The clock frequency at which PLL1 operates.
  • Logic controller 1 can also send clock frequency configuration instructions to PLL2 in serial interface 2 through serial interface 1.
  • PLL2 performs frequency multiplication according to the reference clock frequency provided by reference clock REF_CLK to obtain the clock frequency for PLL2 operation.
  • the serial interface 1 can determine the transmission rate of the serial interface 1 based on the clock frequency of PLL1, and the serial interface 2 can determine the transmission rate of the serial interface 1 based on the clock frequency of PLL2.
  • the clock frequency provided by the reference clock REF_CLK is 500MHz, and both PLL1 and PLL2 can be frequency multiplied by 4, so that the clock frequencies of PLL1 and PLL2 are both 2000MHz.
  • the transmission rate of the default serial interface is 4 times the clock frequency of the PLL, then the transmission rate of serial interface 1 and serial interface 2 is 8000MT/s, that is, both are 8GT/s.
  • the system controls the main chip and the cache chip to complete the first rate negotiation and determine the first target rate for the main chip to access the cache chip.
  • the main chip can perform communication between the main chip and the cache chip according to the transmission rate of the serial interface 1 of the main chip and the transmission rate of the serial interface 2 in the cache chip.
  • the first rate negotiation and when the negotiation is successful, the transmission rate of the serial interface in the main chip and the transmission rate of the serial interface in the cache chip are determined as the first target rate for the main chip to access the cache chip.
  • the rate negotiation process may be similar to the existing rate negotiation process.
  • the logic controller 1 in the main chip controls the serial interface 1 to send a message to the serial interface 2 in the cache chip at a transmission rate of 8GT/s. If the serial interface 2 receives the message at a transmission rate of 8GT/s, message and sent a response message indicating successful message reception to serial interface 1.
  • the first rate negotiation was successful.
  • the main chip can determine that the first target rate for the serial interface 1 to access the cache chip is 8GT/s.
  • the main chip can start accessing the cache chip according to the first target rate, such as reading, writing or rewriting the memory chip through the cache chip. .
  • the first target rate at this time can be understood as the low-frequency rate.
  • the main chip Complete the low-speed negotiation process with the cache chip.
  • the system controls the main chip to access the register of the cache chip at the first target rate and initializes the register of the cache chip.
  • the main chip can access the registers in the cache chip through serial interface 1 and serial interface 2 to initialize the registers in the cache chip, so that the registers in the cache chip are in a working state.
  • the main chip can send a register configuration initialization instruction to the logic controller 2 through the serial interface 1 and the serial interface 2, and the logic controller 2 initializes the registers in the cache chip according to the instruction.
  • the cache chip is a buffer chip. Normally, there may be no CPU on the cache chip. Therefore, at the first target speed, the CPU in the main chip needs to configure the registers in the cache chip so that the registers in the cache chip work. Therefore, in this application, customized messages can be used to support the main chip to access the registers in the cache chip.
  • controlling the initialization of each DIMM interface in the cache chip can be understood as controlling the cache chip to access the memory chip connected to the cache chip through the DIMM interface to determine the DIMM interface.
  • the highest supported rate use the highest frequency supported by the DIMM interface as the second target rate.
  • the logic controller 2 in the cache chip can control the DIMM interface DDR PHY2 in the cache chip to communicate with the DIMM interface DDR PHY3 of the memory chip, chip 3, so that the logic controller 2 determines the slot of the DDR PHY2 Type of chip 3 inserted and determine the highest rate supported by that type.
  • Logic controller 2 records this maximum rate in a register of the cache chip. This register is, for example, REG.max_dimm_rate, which is used to save the transmission rate.
  • the system controls the main chip to read the second target rate from the register of the cache chip.
  • the cache chip When the cache chip saves the highest rate supported by the DIMM interface, it can notify the main chip through serial interface 2 that the highest rate determination is completed.
  • Logic controller 1 in the main chip again accesses the register of the cache chip through serial interface 1 and serial interface 2, for example, accesses the above REG.max_dimm_rate, reads the highest rate supported by the DIMM interface stored in the register of the cache chip, and sets the highest rate The speed is determined as the second target speed.
  • the rate value of the second target rate is greater than the rate value of the first target rate.
  • the first target rate is determined through low-speed startup negotiation between the main chip and the cache chip. After the main chip starts, the second target rate will be jointly negotiated between the serial interfaces for the main chip to access the memory chip through the cache chip. rate.
  • the system control main chip initializes the clock frequency of the first PLL and the clock frequency of the second PLL according to the second target rate, so that the clock frequency of the first PLL and the clock frequency of the second PLL are clock frequencies corresponding to the second target rate. .
  • the logic controller 1 can reinitialize PLL1 according to the second target rate, and the logic controller 2 can reinitialize PLL2 according to the second target rate, that is, redetermine the clock frequencies of PLL1 and PLL2.
  • the second target rate is 32GT/s, that is, the highest frequency supported by the DIMM interface is 32GT/s.
  • the logic controller 1 can determine the PLL1 The clock frequency after re-initialization is 8000MHz, and the logic controller 2 can determine that the clock frequency of PLL2 after re-initialization is 8000MHz.
  • the system controls the serial interface of the main chip and the serial interface of the cache chip to conduct the second rate negotiation at the second target rate.
  • step 507 when the clock frequency of PLL1 and PLL2 is re-initialized to 8000MHz, the transmission rate of serial interface 1 and the transmission rate of serial interface 1 are both the second target rate of 32GT/s.
  • the logic controller 1 can control the serial interface 1 and the serial interface 2 to perform the second rate negotiation.
  • logic controller 1 controls serial interface 1 to send a message to serial interface 2 at a transmission rate of 32GT/s, and serial interface 2 receives the message at a transmission rate of 32GT/s. If the message is received successfully, the serial interface Line interface 2 sends a successful negotiation response message to serial interface 1.
  • the SerDes in the main chip and the cache chip, and the DRAM between the cache chip and the memory chip will work at a unified clock frequency (frequency point), and the PLL1 and PLL2 are the same clocks, and clk_ddr and clk_sds2 are also Based on the same source clock, when the main chip accesses the memory chip through the cache chip, it will avoid crossing asynchronous clock domains and reduce data transmission delays.
  • serial interface of the main chip and the serial interface of the cache chip when the serial interface of the main chip and the serial interface of the cache chip perform rate negotiation, they support any specified integer rate for negotiation and are not limited to supporting negotiation of a few fixed rates. For example, it is not limited to GT/s and 16GT/s supported by PCIE.
  • the operating frequency point (clock frequency) of DARM and the operating frequency point of SerDes can be unified.
  • the transmission parameters of DDR5 and SerDes can be seen in Table 1.
  • DDR TYPE indicates the type of DRAM is DDR5
  • DDR Data Rate indicates the transmission data rate of DDR5 in MT/s
  • DDR PHY CLK indicates the clock frequency of the DDR5 DIMM interface
  • DDRC CLK indicates the DDR controller of DDR5 Clock frequency
  • SerDes data CLK represents the clock frequency of SerDes data transmission. It can be seen that the clock frequency of DDR PHY CLK is 2 times the clock frequency of SerDes data CLK. Assume that the first target rate of the first rate negotiation is 2100MT/s in SerDes data CLK.
  • the clock frequency of the DDR PHY CLK can be obtained through the second rate negotiation, that is, the second target rate can be obtained, for example, 4200MT. /s. That is to say, this application can unify the clock frequency of SerDes data CLK to the clock frequency of DDR PHY CLK of 4200MT/s, realizing the unification of the clock frequencies of SerDes and DRAM.
  • the system controls business transmission between the main chip, cache chip and memory chip.
  • the main chip and the cache chip use a common reference clock REF_CLK
  • the clock domain of the DIMM interface of the cache chip and the serial interface SerDes of the cache chip are from the same source
  • the main chip and the cache chip finally select the highest rate , that is, the second target rate is related to the highest rate supported by the memory chip inserted into the DIMM interface on the cache chip.
  • the transmission rate of the serial interface SerDes will also change synchronously, which is equivalent to unifying the clock frequencies of SerDes and DRAM.
  • the main chip accesses the memory chip through the cache chip, that is, when the data flow spans multiple chips, the data flow is transmitted in the synchronous clock domain, without the consumption of asynchronous processing delay, which can reduce the data transmission delay.
  • the additional delay of the entire system can be controlled to 10ns.
  • the rate unification process in this application occurs during the startup process of the system when it is powered on, but is not limited to being executed only during the startup process. It can also be performed during the normal communication process of the system by grabbing A process that packages execution frequency uniformly. For example, the main chip and the cache chip conduct the first rate negotiation and the second rate negotiation during the data packet transmission process to achieve rate unification.
  • the system for expanding the memory through the serial interface includes corresponding hardware and/or software modules for performing each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions in conjunction with the embodiments for each specific application, but such implementations should not be considered to be beyond the scope of this application.
  • This embodiment can divide functional modules into a system that expands memory through a serial interface according to the above method example.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. middle.
  • the above integrated modules can be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic and is only a logical function division. In actual implementation, there may be other division methods.
  • FIG. 7 shows a possible composition diagram of the system 70 for expanding memory through a serial interface involved in the above embodiment.
  • the system 70 for row interface memory expansion may include: a rate negotiation unit 701 , a rate acquisition unit 702 and an initialization unit 703 .
  • the rate negotiation unit 701 can be used to support the system 70 to expand the memory through the serial interface to perform the above-mentioned step 401, step 403, step 503, step 508, step 509, etc., and/or other processes for the technology described herein. .
  • the rate acquisition unit 702 may be used for the system 70 that supports memory expansion through a serial interface to perform the above steps 402, 505, 506, etc., and/or other processes for the technology described herein.
  • the initialization unit 703 may be used for the system 70 that supports memory expansion through a serial interface to perform the above steps 501, 502, 504, 507, etc., and/or other processes for the technology described herein.
  • the system 70 for expanding memory through a serial interface provided in this embodiment is used to perform the above-mentioned method of unifying the clock frequency, and therefore can achieve the same effect as the above-mentioned implementation method.
  • the system 70 for expanding the memory through the serial interface can also be the system 80 for expanding the memory through the serial interface as shown in FIG. 8 , including a processing module and a storage module.
  • the processing module can be used to control and manage the actions of the system 70 that expands the memory through the serial interface.
  • it can be used to support the system 70 that expands the memory through the serial interface to execute the above-mentioned rate negotiation unit 701, rate acquisition unit 702 and Steps performed by the initialization unit 703.
  • the storage module can be used to store program code, data, etc. in the system 70 that supports memory expansion through a serial interface.
  • the processing module may be a processor or a controller. It may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with this disclosure.
  • a processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, etc.
  • the storage module may be a memory.
  • the processing module when the processing module is a processor and the storage module is a memory, the processing module may be the processor/controller/control circuit in the main chip of the present application, and/or the processor/controller in the cache chip. /Control circuit.
  • the memory module may be a memory in the main chip and/or a memory in the cache chip.
  • the system for expanding memory through a serial interface involved in this embodiment may be a system with the structure shown in FIG. 2 .
  • An embodiment of the present application also provides an electronic device, including one or more processors and one or more memories.
  • the one or more memories are coupled to one or more processors.
  • the one or more memories are used to store computer program codes.
  • the computer program codes include computer instructions.
  • Embodiments of the present application also provide a computer storage medium.
  • Computer instructions are stored in the computer storage medium.
  • the electronic device causes the electronic device to execute the above related method steps to implement the unified clock in the above embodiment. frequency method.
  • Embodiments of the present application also provide a computer program product.
  • the computer program product When the computer program product is run on a computer, it causes the computer to perform the above related steps to implement the method of unifying the clock frequency performed by the electronic device in the above embodiment.
  • inventions of the present application also provide a device.
  • This device may be a chip, a component or a module.
  • the device may include a connected processor and a memory.
  • the memory is used to store computer execution instructions.
  • the processor can execute computer execution instructions stored in the memory, so that the chip executes the unified clock frequency method executed by the electronic device in each of the above method embodiments.
  • the system, electronic device, computer storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, the beneficial effects it can achieve can be referred to the above provided The beneficial effects of the corresponding methods will not be described again here.
  • Another embodiment of the present application provides a system, which may include the above-mentioned main chip, a cache chip, and multiple memory chips, and may be used to implement the above-mentioned method of unifying the clock frequency.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be The combination can either be integrated into another device, or some features can be omitted, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or contribute to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium , including several instructions to cause a device (which can be a microcontroller, a chip, etc.) or a processor to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System (AREA)

Abstract

本申请提供一种统一时钟频率的方法和装置,涉及芯片技术领域,能够在使用串行接口扩展内存时,统一SerDes和DRAM的频率,系统工作在同步时钟域,实现低延时。方案为:与主芯片中的串行接口耦合的第一锁相环和与缓存芯片中的串行接口耦合的第二锁相环的时钟同源,缓存芯片中的双列直插式存储模块接口和缓存芯片中的串行接口与第二锁相环耦合。主芯片和缓存芯片进行第一次速率协商,确定第一目标速率;主芯片以第一目标速率初始化缓存芯片,确定缓存芯片与内存芯片间的双列直插式存储模块接口支持的第二目标速率;将第一锁相环和第二锁相环的时钟频率均配置为与第二目标速率对应的时钟频率,完成第二次速率协商。本申请实施例用于串行接口扩展DRAM系统。

Description

一种统一时钟频率的方法和装置
本申请要求于2022年05月06日提交国家知识产权局、申请号为202210488354.X、申请名称为“一种统一时钟频率的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及芯片技术领域,尤其涉及一种统一时钟频率的方法和装置。
背景技术
目前,双倍数据速率(Double data rate,DDR)体系速度升级缓慢,业界通常使用串行接口,即串行器-解串器(SERializer and DESerializer,SerDes)扩展RRD模组或者其他内存介质提升内存(memory)带宽的速率。
在通过SerDes扩展动态随机存取存储器(Dynamic Random-access memory,DRAM)实现扩展内存的方案中,通常包括主中央处理单元(Central Processing Unit,CPU)芯片、缓存芯片和DRAM芯片。主CPU芯片和缓存芯片间为SerDes串行接口,缓存芯片和DRAM芯片之间为双列直插式存储模块(Dual-Inline-Memory-Modules,DIMM)接口。SerDes的传输频点和DRAM的传输频点是两个体系,没有关联性。
具体来说,由于DRAM的DIMM接口在应用场景中,存在多种传输频点。当DRAM的插槽上插入DIMM芯片后,系统才能确定DRAM芯片工作的频点。而后,系统根据该频点配置缓存芯片中锁相环(Phase Locked Loop,PLL)3和DDR的时钟clk_ddr的时钟频率为与该频点对应的指定频率。而在SerDes接口侧,当前串行协议也通常工作在固定的频点上。因此,在目前的技术中,缓存芯片中的SerDes串行接口侧的时钟clk_sds与DDR的时钟clk_ddr通常是异步时钟,数据流需要在发送(transmit,TX)方向和接收(receive,RX)方向分别跨越时钟域(clock domain)进行异步处理。而且,通常主CPU芯片中的系统总线时钟clk_bus和SerDes串行接口的工作时钟clk_sds也是异步时钟,数据在TX方向和RX方向也需进行异步处理。这样,一个数据流在TX方向和RX方向需要跨4次异步时钟域。况且,如果主CPU芯片和缓存芯片不是同源时钟时,一个数据流需跨6次异步时钟域,数据流跨越异步时钟域带来的延时较大。
发明内容
本申请实施例提供一种统一时钟频率的方法和装置,能够在使用串行接口扩展内存时,统一SerDes和DRAM的频率,使得整个系统工作在同步时钟域,减少数据流跨越异步时钟次数,最终达成低延时的目标。
为达到上述目的,本申请实施例采用如下技术方案:
第一方面,提供一种统一时钟频率的方法,应用于通过串行接口扩展内存的系统,系统包括主芯片、缓存芯片和与缓存芯片耦合的至少一个内存芯片,与主芯片中的串行接口耦合的第一锁相环和与缓存芯片中的串行接口耦合的第二锁相环的时钟同源,缓存芯片中的双列直插式存储模块接口和缓存芯片中的串行接口均与第二锁相环耦合, 该方法包括:
系统控制主芯片和缓存芯片完成串行接口初始化后,执行主芯片和缓存芯片间的第一次速率协商,确定主芯片访问缓存芯片的第一目标速率;系统控制主芯片以第一目标速率对缓存芯片进行初始化完成后,确定缓存芯片与内存芯片间的双列直插式存储模块接口支持的第二目标速率;系统将主芯片中的第一锁相环的时钟频率和缓存芯片中的第二锁相环的时钟频率均配置为与第二目标速率对应的时钟频率,并控制主芯片与缓存芯片完成第二次速率协商。
其中,串行接口可理解为SerDes,锁相环可以是PLL,双列直插式存储模块可以是DIMM。
由此,在使用串行接口扩展内存的系统上,本申请中系统的所有SerDes使用同一个同源参考时钟,DRAM的工作时钟,来自于跟SerDes同一个PLL。也就相当DRAM和SerDes的时钟是同源的。
在同源时钟的改动的基础上,在系统启动过程中,本申请采用两次速率协商的方式完成速率协商,以达到DRAM和SerDes的频点统一的目的。其中,第一次速率协商中,主芯片和缓存芯片可先协商到一个低频点速率,即第一目标速率。在这个速率下,主芯片可对缓存芯片进行初始化,并获取缓存芯片的信息,即获取缓存芯片上插入的内存芯片的速率。主芯片在获得缓存芯片的信息时,可确定DRAM和SerDes共同工作的频点,即为内存芯片插入插槽时的DIMM接口的最高速率对应的频点。而后,主芯片初始化主芯片中的PLL和缓存芯片中的PLL的频点为内存芯片的速率对应的频点。基于初始化后的PLL的频点,使得主芯片和缓存芯片进行第二次速率协商,在协商成功时,也就使得DRAM和SerDes协商到了一个统一的目标速率,即第二目标速率。完成两次速率协商后,主芯片、缓存芯片和内存芯片间可开始业务传输。
这样一来,在主芯片和缓存芯片使用公共的参考时钟,且主芯片和缓存芯片的最高速率与缓存芯片上的DIMM接口的内存芯片支持的最高速率相关。当DIMM接口的内存芯片支持的最高速率不同时,主芯片和缓存芯片的串行接口SerDes的传输速率也会同步发生变化。这样,数据在跨多个芯片时,仍然工作在同步时钟域,无异步处理的延时消耗,可降低系统在异步时钟额外增加的延时。
在一种可能的设计中,控制主芯片和缓存芯片完成串行接口初始化后,执行主芯片和缓存芯片间的第一次速率协商,以确定主芯片访问缓存芯片的第一目标速率包括:控制主芯片中的第一锁相环和缓存芯片中的第二锁相环根据同源的参考时钟进行倍频处理,得到第一锁相环工作的时钟频率和第二锁相环工作的时钟频率,第一锁相环工作的时钟频率和第二锁相环工作的时钟频率相同;根据第一锁相环工作的时钟频率确定主芯片中的串行接口的传输速率,以及根据第二锁相环工作的时钟频率确定缓存芯片中的串行接口的传输速率;根据主芯片的串行接口的传输速率和缓存芯片中的串行接口的传输速率执行主芯片和缓存芯片间的第一次速率协商,并在协商成功时确定主芯片中的串行接口的传输速率和缓存芯片中的串行接口的传输速率为主芯片访问缓存芯片的第一目标速率。
需理解,在主芯片和缓存芯片上电完成串行接口的初始化后,在主芯片中的第一锁相环和缓存芯片中的第二锁相环中的时钟频率根据同源的参考时钟频率确定后,相 当于确定了主芯片的传输速率和缓存芯片的传输速率,且传输速率相同。这样,当主芯片根据确定的传输速率与缓存芯片进行速率协商成功后,相当于确定了主芯片可访问缓存芯片的第一目标速率。或者说,主芯片和缓存芯片间协商到了一个低频速率,主芯片可通过该低频速率访问缓存芯片。
在一种可能的设计中,控制主芯片以第一目标速率对缓存芯片进行初始化完成后,确定缓存芯片与内存芯片间的双列直插式存储模块接口支持的第二目标速率包括:控制主芯片的串行接口以第一目标速率访问缓存芯片的寄存器,以对缓存芯片的寄存器进行初始化;控制缓存芯片通过双列直插式存储模块接口访问与缓存芯片对接的内存芯片,确定双列直插式存储模块接口支持的最高速率,将双列直插式存储模块接口支持的最高频率作为第二目标速率;控制缓存芯片将第二目标速率记录在缓存芯片的寄存器中。
考虑到缓存芯片的块DIMM接口上插入的内存芯片支持的访问速率可能不同。当主芯片可访问缓存芯片时,缓存芯片可将DIMM接口当前插入的内存芯片支持的访问速率记录在缓存芯片的寄存器中。这样,主芯片可从该寄存器中确定内存芯片支持的最高速率(第二目标速率),以将主芯片和缓存芯片的时钟频率调整为与该最高速率匹配的时钟频率,将避免跨异步时钟域,减少数据传输的延时。
在一种可能的设计中,将主芯片中的第一锁相环的时钟频率和缓存芯片中的第二锁相环的时钟频率均配置为与第二目标速率对应的时钟频率,并控制主芯片与缓存芯片完成第二次速率协商包括:控制主芯片从缓存芯片的寄存器中读取第二目标速率;控制主芯片根据第二目标速率初始化第一锁相环的时钟频率和第二锁相环的时钟频率,使第一锁相环的时钟频率和第二锁相环的时钟频率为与第二目标速率对应的时钟频率;控制主芯片的串行接口和缓存芯片的串行接口以第二目标速率进行第二次速率协商。
这样一来,主芯片和缓存芯片中的SerDes,和缓存芯片与内存芯片间的DRAM将工作在统一的时钟频率(频点/第二目标速率)上,且在第一PLL和第二PLL为同源时钟,且缓存芯片中的时钟和串行接口也为同源时钟的基础上,主芯片在通过缓存芯片访问内存芯片时,将避免跨异步时钟域,减少数据传输的延时。
在一种可能的设计中,主芯片的串行接口和缓存芯片的串行接口进行速率协商时,支持任意指定的整数速率。也就是说,主芯片和缓存芯片的串行接口进行速率协商时,不会限定只支持某些固定频率(例如PCIE的8Gbps、16Gpbs等)的协商,本申请这种两次速率协商过程,可支持任意指定的整数速率进行协商。
第二方面,提供一种通过串行接口扩展内存的系统,系统包括主芯片、缓存芯片和与缓存芯片耦合的至少一个内存芯片,与主芯片中的串行接口耦合的第一锁相环和与缓存芯片中的串行接口耦合的第二锁相环的时钟同源,缓存芯片中的双列直插式存储模块接口和缓存芯片中的串行接口均与第二锁相环耦合,其中:
主芯片,用于和缓存芯片完成串行接口初始化后,和缓存芯片完成第一次速率协商,确定主芯片访问缓存芯片的第一目标速率;以第一目标速率对缓存芯片进行初始化完成后,确定缓存芯片与内存芯片间的双列直插式存储模块接口支持的第二目标速率;将主芯片中的第一锁相环的时钟频率配置为与第二目标速率对应的时钟频率;缓存芯片,用于将缓存芯片中的第二锁相环的时钟频率均配置为与第二目标速率对应的 时钟频率;主芯片,还用于与缓存芯片完成第二次速率协商。
第二方面以及第二方面的任一种可能的设计的有益效果可参见对第一方面的说明。
在一种可能的设计中,主芯片,用于控制第一锁相环L根据同源的参考时钟进行倍频处理,得到第一锁相环工作的时钟频率;缓存芯片,用于控制第二锁相环根据同源的参考时钟进行倍频处理,得到第二锁相环工作的时钟频率,第一锁相环工作的时钟频率和第二锁相环工作的时钟频率相同;主芯片,用于根据第一锁相环工作的时钟频率确定主芯片中的串行接口的传输速率;缓存芯片,用于根据第二锁相环工作的时钟频率确定缓存芯片中的串行接口的传输速率;主芯片,用于根据主芯片的串行接口的传输速率和缓存芯片中的串行接口的传输速率执行主芯片和缓存芯片间的第一次速率协商;在协商成功时确定主芯片中的串行接口的传输速率为主芯片访问缓存芯片的第一目标速率;缓存芯片,用于在协商成功时确定缓存芯片中的串行接口的传输速率为主芯片访问缓存芯片的第一目标速率。
在一种可能的设计中,主芯片,用于控制主芯片的串行接口以第一目标速率访问缓存芯片的寄存器,以对缓存芯片的寄存器进行初始化;缓存芯片,用于通过双列直插式存储模块接口访问与缓存芯片对接的内存芯片,确定双列直插式存储模块接口支持的最高速率,将双列直插式存储模块接口支持的最高频率作为第二目标速率;将第二目标速率记录在缓存芯片的寄存器中。
在一种可能的设计中,主芯片,用于从缓存芯片的寄存器中读取第二目标速率;根据第二目标速率初始化第一锁相环的时钟频率和第二锁相环的时钟频率,使第一锁相环的时钟频率和第二锁相环的时钟频率为与第二目标速率对应的时钟频率;控制主芯片的串行接口和缓存芯片的串行接口以第二目标速率进行第二次速率协商。
在一种可能的设计中,主芯片的串行接口和缓存芯片的串行接口进行速率协商时,支持任意指定的整数速率。
第三方面,提供一种频率控制装置,频率控制装置应用于通过串行接口扩展内存的系统,系统包括主芯片、缓存芯片和与缓存芯片耦合的至少一个内存芯片,与主芯片中的串行接口耦合的第一锁相环和与缓存芯片中的串行接口耦合的第二锁相环的时钟同源,缓存芯片中的双列直插式存储模块接口和缓存芯片中的串行接口均与第二锁相环耦合,频率控制装置包括:
速率协商单元,用于控制主芯片和缓存芯片完成串行接口初始化后,控制主芯片和缓存芯片执行主芯片和缓存芯片间的第一次速率协商,以确定主芯片访问缓存芯片的第一目标速率;速率获取单元,用于控制主芯片以第一目标速率对缓存芯片进行初始化完成后,确定缓存芯片与内存芯片间的DIMM接口支持的第二目标速率;速率协商单元,还用于将主芯片中的第一PLL的时钟频率和缓存芯片中的第二PLL的时钟频率均配置为与第二目标速率对应的时钟频率,并控制主芯片与缓存芯片完成第二次速率协商。
第三方面以及第三方面的任一种可能的设计的有益效果可参见对第一方面的说明。
在一种可能的设计中,速率协商单元用于:控制主芯片中的第一锁相环和缓存芯片中的第二锁相环根据同源的参考时钟进行倍频处理,得到第一锁相环工作的时钟频率和第二锁相环工作的时钟频率,第一锁相环工作的时钟频率和第二锁相环工作的时 钟频率相同;根据第一锁相环工作的时钟频率确定主芯片中的串行接口的传输速率,以及根据第二锁相环工作的时钟频率确定缓存芯片中的串行接口的传输速率;根据主芯片的串行接口的传输速率和缓存芯片中的串行接口的传输速率执行主芯片和缓存芯片间的第一次速率协商,并在协商成功时确定主芯片中的串行接口的传输速率和缓存芯片中的串行接口的传输速率为主芯片访问缓存芯片的第一目标速率。
在一种可能的设计中,速率获取单元用于:控制主芯片的串行接口以第一目标速率访问缓存芯片的寄存器,以对缓存芯片的寄存器进行初始化;控制缓存芯片通过双列直插式存储模块接口访问与缓存芯片对接的内存芯片,确定双列直插式存储模块接口支持的最高速率,将双列直插式存储模块接口支持的最高频率作为第二目标速率;控制缓存芯片将第二目标速率记录在缓存芯片的寄存器中。
在一种可能的设计中,速率协商单元用于:控制主芯片从缓存芯片的寄存器中读取第二目标速率;控制主芯片根据第二目标速率初始化第一锁相环的时钟频率和第二锁相环的时钟频率,使第一锁相环的时钟频率和第二锁相环的时钟频率为与第二目标速率对应的时钟频率;控制主芯片的串行接口和缓存芯片的串行接口以第二目标速率进行第二次速率协商。
在一种可能的设计中,主芯片的串行接口和缓存芯片的串行接口进行速率协商时,支持任意指定的整数速率。
第四方面,提供一种频率控制装置,包括至少一个处理器,至少一个处理器与存储器相连,至少一个处理器用于读取并执行存储器中存储的程序,以使得该装置执行如上述第一方面或第一方面的任一项所述的方法。
第五方面,提供一种频率控制装置,频率控制装置包括主芯片、缓存芯片和内存芯片,主芯片与存储器耦合,用于读取并执行所述存储器中存储的程序指令,以实现如上述第一方面或第一方面的任一项所述的方法。
第六方面,提供一种频率控制装置,该装置包含在电子设备中,该装置具有实现上述任一方面及任一项可能的实现方式中电子设备行为的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元。例如,速率协商模块或单元和速率获取模块或单元等。
第七方面,提供一种计算机可读存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述第一方面以及第一方面的任一种可能的设计所述的方法。
第八方面,提供一种计算机程序产品,当计算机程序产品在计算机或处理器上运行时,使得计算机或处理器执行上述第一方面及任一项可能的实现方式中的方法。
可以理解的是,上述提供的任一种频率控制装置、计算机可读存储介质或计算机程序产品等均可以应用于上文所提供的对应的方法,因此,其所能达到的有益效果可参考对应的方法中的有益效果,此处不再赘述。
本申请的这些方面或其他方面在以下的描述中会更加简明易懂。
附图说明
图1为本申请实施例提供的一种使用SerDes扩展DRAM的方案示意图;
图2为本申请实施例提供的一种串行接口扩展DRAM系统组网示意图;
图3为本申请实施例提供的一种系统组网结构示意图;
图4为本申请实施例提供的一种统一时钟频率的方法流程示意图;
图5为本申请实施例提供的一种统一时钟频率的方法流程示意图;
图6为本申请实施例提供的一种通过串行接口扩展内存的系统结构示意图;
图7为本申请实施例提供的一种通过串行接口扩展内存的系统结构示意图;
图8为本申请实施例提供的一种串行接口扩展内存的系统结构示意图。
具体实施方式
为了便于理解,示例的给出了部分与本申请实施例相关概念的说明以供参考。如下所示:
DRAM:动态随机存取存储器,是最为常见的系统内存。DRAM能将数据保持很短的时间。为了保持数据,DRAM使用电容存储,需隔一段时间刷新(refresh)一次,如果DRAM中的存储单元没有被刷新,存储的信息就会丢失,例如关机就会丢失数据。
DDR:双倍数据速率,目前的DDR1-DDR5是内存的代数,不同代数的内存,传输速率不同,理论上来说,DDR1代表一代。DDR2的传输速率是DDR1的2倍,DDR5的传输速率是DDR3的2倍。
DIMM:可理解为DRAM上的插槽插入的内存条,可提供64位数据通道。
SerDes:是一种主流的时分多路复用、点对点的串行通信技术。即在发送端多路低速并行信号被转换成高速串行信号,经过传输媒体,最后在接收端将高速串行信号重新转换成低速并行信号。这种点对点的串行通信技术充分利用传输媒体的信道容量,减少所需的传输信道和器件引脚数目,提升信号的传输速度,从而大大降低通信成本。
异步时钟:当两个时钟间的相位是固定关系的,可以称这两个时钟为同步时钟(synchronous clock)。一般同源,如由同一个混合模式时钟管理程序(Mixed Mode Clock Manager,MMCM)或锁相环(Phase Locked Loop,PLL)产生的两个时钟可以称为同步时钟。因此,可以将主时钟和与之对应的衍生时钟约束成同一个时钟组。而无法判定两个时钟间相位时,可以称这两个时钟为异步时钟(asynchronous clocks)。两个来自不同晶振的时钟,一定是异步时钟。通常情况下,设计中不同的主时钟肯定是异步时钟,因此可以将这两个主时钟及其衍生时钟约束成不同的时钟组。
PLL:利用外部输入的参考信号控制环路内部振荡信号的频率和相位。由于PLL可以实现输出信号频率对输入信号频率的自动跟踪,因此PLL通常用于闭环跟踪电路。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
在整个计算机系统中,内存可谓是决定整机性能的关键因素,光有快的CPU,没 有好的内存系统与之配合,CPU性能再优秀也无从发挥。目前,服务器CPU核心(从8核到126核)增长了16倍,而Pin针脚数量和内存通道数(从4通道到8通道)仅增长了2倍。可以看出,内存带宽的发展速率跟不上CPU核心的发展速度,平均每核心的DRAM内存带宽,呈现不断下降的趋势。
标准的DDR模组通常采用64位结构,一次可传输64bit二进制数据,对应的是64位并行的内存总线。然而,由于先天限制,并行总线很难实现跨越性的性能提升。首先,并行总线很容易相互干扰,令传输信号不稳定,频率很难迅速提升,内存规格按部就班地缓慢提升并不仅仅是针对市场消费考虑,更多是技术现实使然。其次,内存模组发送出的并行数据要求在同一个传输节拍中同步到达接收端,这就要求印制电路板(Printed Circuit Board,PCB)中的64条线路长度保持严格一致,对PCB设计提出苛刻的要求。而随着内存频率不断提升,允许的线路长度误差越来越小,最终将导致设计线路成为一大难题。从这个趋势来看,并行内存的发展空间有限,最终会向窄位宽和串行化的方向发展。第三,同样由于布线的原因,DDR体系最多只能实现双通道,必须用到128条数据线路,占据大量的PCB面积。即使要实现4通道,主板也没有较大的PCB空间可以利用,数据同步化问题会带来更多的困扰。种种缺陷造成DDR体系速度升级维持缓慢,于是业界开始尝试使用串行接口来扩展DDR模组或者其它内存介质。
目前,可使用开放内存接口(Open Memory Interface,OMI)技术,提升系统的内存带宽和容量。针对DDR体系,旨在通过两种方式解决这些近内存挑战:迁移到SerDes,以及使用DIMM控制器。即SerDes连接取代了当前的DDR式接口,以更少的信号提供更高的速度。且OMI基本上从主机移除了内存控制器,将内存控制器整合于内存DIMM上,简化了处理器设计。而且,该控制器可以连接到许多不同类型的内存,充当了内存和处理器之间的桥梁。
如图1所示,在使用SerDes扩展DRAM的方案中,芯片1为主芯片(主CPU芯片),芯片2为缓存芯片,芯片3为内存芯片,一个芯片2上可耦合多个芯片3,以达到扩展内存的目的。芯片1和芯片2之间为SerDes串行接口,芯片2和芯片3之间的接口为DIMM并行接口。
现有的方案中,SerDes的工作频点和DRAM的工作频点没有联系。芯片1中包括总线时钟clk_bus和逻辑时钟(clk_logic),SerDes1(串行接口1)接口侧的PLL1的工作时钟clk_sds1的参考时钟为REF_CLK1,芯片2中SerDes2(串行接口2)接口侧的PLL2的工作时钟clk_sds2的参考时钟为REF_CLK2。芯片2中DIMM接口耦合的PLL3触发DDR物理接口(Physical Interface,PHY)2的工作时钟clk_ddr与clk_sds2的参考时钟同源,均为REF_CLK2。
其中,DIMM接口在应用场景中,存在多种频点对应的传输速率,例如DDR5的传输速率有4.8GT/s、5.200GT/s、6.800GT/s和8.400GT/s等。当DRAM的插槽上插入DIMM内存条(DDR PHY3)后,芯片1才能确定该DRAM的工作频点。芯片1才能根据DRAM的工作频点配置芯片2中的PLL3的触发DDR PHY2的工作时钟clk_ddr为该工作频点。而芯片2的SerDes2接口侧,当前的串行协议也通常工作在固定的频点上,例如高速串行计算机扩展总线标准(peripheral component interconnect  express,PCIE)协议的数据速率(data rate)通常存在多个固定频点的传输速率,例如2.5GT/s、5GT/s、8GT/s、16GT/s和32GT/s。
因此,DRAM和SerDes的工作频点并不统一。也就是说,在目前使用SerDes扩展DRAM的方案中,芯片2中的clk_sds2与clk_ddr通常是异步时钟,数据流需要在TX方向和RX方向分别跨越时钟域,即图1中示出的数据流在经过③指示的TX方向和④指示的RX方向需要进行异步处理。此外,通常,芯片1中的系统总线时钟clk_bus与芯片1中的串行接口SerDes1的工作时钟clk_sds1也是异步时钟,当数据流经过图1中的①指示的TX方向和⑥指示的RX方向也需要进行异步处理。
而且,如果芯片1中的SerDes1和芯片2中的SerDes2共用参考时钟,且clk_sds1和clk_sds2的频点统一时,数据流在经过②指示的TX方向和⑤指示的RX方向时,不需要进行异步处理。如果芯片1中的SerDes1和芯片2中的SerDes2不共用参考时钟(PLL1使用REF_CLK1,PLL2使用REF_CLK2)时,数据流在经过②指示的TX方向和⑤指示的RX方向时,就需要进行异步处理。
这样一来,在未统一DRAM和SerDes的工作频点时,如果芯片1和芯片2使用同源时钟,数据流需要跨4次异步时钟域。如果芯片1和芯片2未使用同源时钟,数据流需要跨6次异步时钟域。在这两种情况下,都会额外增加延时,延时基本无法做到小于10ns。
因此,本申请提供一种统一时钟频率的方法和装置,能够解决在在使用串行接口扩展内存时,统一DRAM和SerDes的频率,使得整个系统工作在同步时钟域,减少数据流跨越异步时钟次数,最终达成低延时的目标。
其中,在使用串行接口扩展内存的系统上,本申请中系统的所有SerDes使用同一个同源参考时钟,DRAM的工作时钟,来自于跟SerDes同一个PLL。也就相当DRAM和SerDes的时钟是同源的。
在同源时钟的改动的基础上,在系统启动过程中,本申请采用两次速率协商的方式完成速率协商,以达到DRAM和SerDes的频点统一的目的。其中,第一次速率协商中,主芯片和缓存芯片可先协商到一个低频点速率,在这个速率下,主芯片可对缓存芯片进行初始化,并获取缓存芯片的信息,即获取缓存芯片上插入的内存芯片的速率。主芯片在获得缓存芯片的信息时,可确定DRAM和SerDes共同工作的频点,即为内存芯片插入插槽时的DIMM接口的最高速率对应的频点。而后,主芯片初始化主芯片中的PLL和缓存芯片中的PLL的频点为内存芯片的速率对应的频点。基于初始化后的PLL的频点,使得主芯片和缓存芯片进行第二次速率协商,在协商成功时,也就使得DRAM和SerDes协商到了一个统一的目标速率。完成两次速率协商后,主芯片、缓存芯片和内存芯片间可开始业务传输。
这样一来,在主芯片和缓存芯片使用公共的参考时钟,且主芯片和缓存芯片的最高速率与缓存芯片上的DIMM接口的内存芯片支持的最高速率相关。当DIMM接口的内存芯片支持的最高速率不同时,主芯片和缓存芯片的串行接口SerDes的传输速率也会同步发生变化。这样,数据在跨多个芯片时,仍然工作在同步时钟域,无异步处理的延时消耗,可降低系统在异步时钟额外增加的延时。
需要说明的是,通常,传输速率和传输频率(频点)之间存在一定的倍数关系。 例如在前端总线(Front Side Bus,FSB)采用的“quad pumped”四倍并发技术进行了改良,该quad pumped为在每个总线时钟周期内传送四次数据。也就是说,总线的数据传输速率等于总线时钟频率的4倍。例如如果总线是333MHz的时钟频率,该总线的数据传输速率为1332MT/s,即为1.332GT/s。
当本申请在确定了SerDes的传输速率和DIMM接口的传输速率一致时,也就相当于确定了PLL1和PLL2的时钟频率一致,再结合PLL1和PLL2为同源参考时钟的情况下,数据在跨主芯片、缓存芯片和内存芯片传输时,不需要进行跨时钟域处理。
本申请提供的统一时钟频率的方法可应用于串行接口扩展DRAM系统组网。该组网结构可如图2所示,包括三种类型芯片。其中芯片1为主芯片,或主CPU芯片;芯片2为缓存芯片;芯片3为DRAM内存芯片,包括图2中示意出的芯片3_0、芯片3_1、芯片3_2和芯片3_3。芯片1位于主板上,芯片2和芯片3位于扣板上。当然,图2示出的组网为一种示意形态,芯片2和芯片也可以直接安装在主板上。
芯片1和芯片2之间为串行接口(SerDes),芯片2和芯片3之间的接口为DIMM接口(DIMM并行接口)。芯片1可通过芯片2扩展更多的DIMM接口,即芯片3的数量不限于图2中示出的4个。这种通过串行接口扩展DRAM的方式,可在芯片2中扩展更多的DIMM接口,从而扩大DRAM容量。当串行接口的带宽大于直连的DIMM接口的带宽时,还可以增加与芯片2耦合的芯片3的数量,使得芯片1获得更大的DRAM访问带宽。
其中,芯片2作为缓存芯片,其作用就是为了扩展内存,使得芯片1可访问更多的芯片3。在一些实施例中,芯片1的串行接口也可以为至少2个,一个串行接口耦合到一个芯片2上,以获得更大的DRAM容量。
在本申请图2提供的一种系统组网的基础上,本申请的系统组网结构具体可以参考图3示出的架构。其中,clk_bus为芯片1中的总线时钟;PLL1为芯片1中的串行接口SerDes1的时钟频率的锁相环,clk_sds1为芯片1中的串行接口SerDes1收发数据的时钟;PLL2为芯片2中的串行接口SerDes2的时钟频率的锁相环,clk_sds2为芯片2中的串行接口SerDes2收发数据的时钟;clk_ddr为芯片2中的DDR PHY2的时钟,或者说芯片2中的DIMM接口的时钟。
需理解,本申请统一SerDes和DRAM时钟频率目的,是为了降低整个系统的延时。而降低延时的关键在于,减少数据通路跨异步时钟域。通过本申请的速率统一方式可达到频率的统一,能够使得芯片1、芯片2和芯片3的接口逻辑电路工作在同步时钟域,从而达到不跨异步时钟域的目的。
参考图3,本申请中,芯片1和芯片2的串行接口SerDes1和SerDes2使用公共的参考时钟REF_CLK。本申请可通过软件将PLL1的时钟源和PLL2的时钟源配置为同一个考时钟REF_CLK,使得PLL1和PLL2同源时,可使得芯片1中的SerDes1和芯片2中的SerDes2工作在相同的频点,PLL1和PLL2可起到锁定相同频点的作用。其中该时钟配置过程可以是通过芯片1实现的。由于PLL1和PLL2共用参考时钟,clk_sds1和clk_sds2的频率相同,只有相位的差别。因此,当芯片1的SerDes1输出的数据在芯片2的SerDes2上采样时,只需要调整相位采样即可,不需要进行跨时钟域异步处理。
而且,芯片2中的clk_sds2和clk_ddr的时钟都来自于PLL2,clk_sds2和clk_ddr具有相同的频率关系或者具有倍频关系,总之clk_sds2和clk_ddr处于相同的时钟域。因此,串行接口的控制器(图3中未示出)的数字逻辑与DDR的控制器的数字逻辑处于同步时钟域,不需要进行跨时钟异步处理。
因此,数据流从芯片1经过芯片2达到芯片3的整个通路中,都不需要经过跨时钟域处理,可达到减小延时的目的。
下面对本申请统一时钟频率的方法进行介绍,参考图4,该方法应用于通过串行接口扩展内存的系统,该系统包括主芯片、缓存芯片和与缓存芯片耦合的至少一个内存芯片。与主芯片中的串行接口耦合的第一PLL和与缓存芯片中的串行接口耦合的第二PLL的时钟同源。缓存芯片中的DIMM接口和缓存芯片中的串行接口均与第二PLL耦合。该方法包括:
401、系统控制主芯片和缓存芯片完成串行接口初始化后,执行主芯片和缓存芯片间的第一次速率协商,以确定主芯片访问缓存芯片的第一目标速率。
图4中的主芯片相当于图2和图3中的芯片1,图4中的缓存芯片相当于图2和图3中的芯片2,图4中的内存芯片相当于图2和图3中的芯片3。图4中的第一PLL相当于图3中的PLL1,图4中的第二PLL相当于图3中的PLL2。
在一些实施例中,在系统上电后,主芯片可控制主芯片和缓存芯片完成串行接口的初始化,初始化过程中,主芯片可控制第一PLL和第二PLL可根据同源的参考时钟REF_CLK进行时钟频率配置,配置后第一PLL的时钟频率和第二PLL的时钟频率相同,从而使得主芯片的串行接口SerDes1和缓存芯片的串行接口SerDes2的速率相同,均为第一目标速率。这时,主芯片可控制主芯片的串行接口SerDes1和缓存芯片的串行接口SerDes2启动速率协商,执行第一次速率协商。如果协商成功,主芯片确定主芯片访问缓存芯片的速率为第一目标速率。
402、系统控制主芯片以第一目标速率对缓存芯片进行初始化完成后,确定缓存芯片与内存芯片间的双列直插式存储模块接口支持的第二目标速率。
主芯片可通过串行接口SerDes1和SerDes2访问缓存芯片中的寄存器,对缓存芯片中的寄存器进行初始化,以便缓存芯片中的寄存器可被访问,包括读操作、写操纵和改写操作。此外,主芯片还可对缓存芯片中的DIMM接口进行初始化,得到缓存芯片的DIMM接口中插入的内存芯片支持的最高速率,将该最高速率记录在缓存芯片的寄存器中。这里的内存芯片支持的最高速率记为第二目标速率。
403、系统将主芯片中的第一锁相环的时钟频率和缓存芯片中的第二锁相环的时钟频率均配置为与第二目标速率对应的时钟频率,并控制主芯片与缓存芯片完成第二次速率协商。
主芯片可访问缓存芯片的寄存器,得到第二目标速率,并将第一PLL和第二PLL的时钟频率配置为与第二目标速率对应的时钟频率,并根据第二目标速率使得主芯片的串行接口SerDes1和缓存芯片的串行接口SerDes2进行第二次速率协商,协商成功后,频率统一过程结束。系统中的SerDes核DRAM工作在了统一的频点上。
由此,在通过SerDes接口扩展DDR的系统中,本申请可通过主芯片中的第一PLL和第二PLL为同源时钟,且DRAM的时钟耦合在第二PLL的基础上,通过两次速率 协商,可使得第一PLL、第二PLL和DRAM的时钟频率统一,数据在该系统中跨芯片传输时,仍然工作在同步时钟域,可降低系统延时。
在图4示出的系统的基础上,下面对本申请提供的统一时钟频率的方法进一步进行介绍。
本申请提供一种统一时钟频率的方法,如图5所示,该方法包括:
501、系统确定主芯片、缓存芯片和至少一个内存芯片上电。
例如,该系统在终端设备中,当用户对终端设备执行开机操作时,主板上电后,主板上的主芯片、缓存芯片和至少一个内存芯片也上电。
502、系统控制主芯片和缓存芯片完成串行接口初始化。
在一些实施例中,主芯片控制主芯片中的第一PLL和缓存芯片中的第二PLL根据同源的参考时钟进行倍频处理,得到第一PLL工作的时钟频率和第二PLL工作的时钟频率,第一PLL工作的时钟频率和第二PLL工作的时钟频率相同。
根据第一PLL工作的时钟频率确定主芯片中的串行接口1(SerDes1)的传输速率,以及根据第二PLL工作的时钟频率确定缓存芯片中的串行接口2(SerDes2)的传输速率。
示例性的,参考图6,主芯片包括逻辑控制器1,缓存芯片包括逻辑控制器2。当逻辑控制器根据总线时钟clk_bus确定主芯片上电时,逻辑控制器1可向串行接口1中的PLL1发送时钟频率配置指示,PLL1可根据参考时钟REF_CLK的基准时钟频率进行倍频处理,得到PLL1工作的时钟频率。逻辑控制器1还可通过串行接口1向串行接口2中的PLL2发送时钟频率配置指示,PLL2根据参考时钟REF_CLK提供的基准时钟频率进行倍频处理,得到PLL2工作的时钟频率。
基于PLL1和PLL2的时钟频率,串行接口1可基于PLL1的时钟频率确定串行接口1确定传输速率,串行接口2可基于PLL2的时钟频率确定串行接口1的传输速率。
例如,参考时钟REF_CLK提供的时钟频率为500MHz,PLL1和PLL2都可进行4倍频处理,得到PLL1和PLL2的时钟频率都为2000MHz。假设默认串行接口的传输速率为PLL的时钟频率的4倍,那么串行接口1和串行接口2的传输速率为8000MT/s,即均为8GT/s。
503、系统控制主芯片和缓存芯片完成第一次速率协商,确定主芯片访问缓存芯片的第一目标速率。
在串行接口1和串行接口2的传输速率确定后,主芯片可根据主芯片的串行接口1的传输速率和缓存芯片中的串行接口2的传输速率执行主芯片和缓存芯片间的第一次速率协商,并在协商成功时确定主芯片中的串行接口的传输速率和缓存芯片中的串行接口的传输速率为主芯片访问缓存芯片的第一目标速率。
该速率协商的过程可与现有的速率协商过程类似。例如主芯片中的逻辑控制器1控制串行接口1以8GT/s的传输速率向缓存芯片中的串行接口2发送报文,如果串行接口2以8GT/s的传输速率接收到该报文,并向串行接口1发送了报文接收成功的响应报文,第一次速率协商成功。主芯片可确定串行接口1访问缓存芯片的第一目标速率为8GT/s,主芯片可根据该第一目标速率开始访问缓存芯片,例如通过缓存芯片对内存芯片进行读、写或改写操作等。此时的第一目标速率可理解为低频速率,主芯片 和缓存芯片完成低速协商过程。
504、系统控制主芯片以第一目标速率访问缓存芯片的寄存器,对缓存芯片的寄存器进行初始化。
在第一目标速率下,主芯片可通过串行接口1和串行接口2访问缓存芯片中的寄存器,以对缓存芯片中的寄存器进行初始化,使得缓存芯片中的寄存器处于工作状态。例如主芯片可通过串行接口1和串行接口2向逻辑控制器2发送寄存器配置初始化指示,逻辑控制器2根据该指示对缓存芯片中的寄存器进行初始化。
在一些实施例中,缓存芯片为一个buffer芯片,通常情况下,缓存芯片上可能无CPU。因此,在第一目标速率下,需要主芯片中的CPU配置缓存芯片中的寄存器,使得缓存芯片中的寄存器工作起来。因此,在本申请中,可通过自定义的报文,以支持主芯片可访问缓存芯片中的寄存器。
505、系统控制缓存芯片中的各DIMM接口初始化完成后,获取缓存芯片和内存芯片间的DIMM接口支持的最高速率,记为第二目标速率,控制缓存芯片将第二目标速率记录在缓存芯片的寄存器中。
在对缓存芯片中的各寄存器初始化完成后,在一些实施例中,控制缓存芯片中的各DIMM接口初始化可理解为,控制缓存芯片通过DIMM接口访问与缓存芯片对接的内存芯片,以确定DIMM接口支持的最高速率,将DIMM接口支持的最高频率作为第二目标速率。
具体地,参考图6,缓存芯片中的逻辑控制器2可控制缓存芯片中的DIMM接口DDR PHY2和内存芯片即芯片3的DIMM接口DDR PHY3进行通信,以便逻辑控制器2确定DDR PHY2的插槽上插入的芯片3的类型,并确定该类型支持的最高速率。逻辑控制器2将该最高速率记录在缓存芯片的寄存器中。该寄存器例如为REG.max_dimm_rate,用于保存传输速率。
506、系统控制主芯片从缓存芯片的寄存器中读取第二目标速率。
在缓存芯片保存了DIMM接口支持的最高速率时,可通过串行接口2向主芯片通知最高速率确定完成。主芯片中的逻辑控制器1再次通过串行接口1和串行接口2访问缓存芯片的寄存器,例如访问上述REG.max_dimm_rate,读取缓存芯片的寄存器存储的DIMM接口支持的最高速率,将该最高速率确定为第二目标速率。
需理解,通常第二目标速率的速率值大于第一目标速率的速率值。第一目标速率是为了实现主芯片和缓存芯片间的低速启动协商确定的,第二目标速率为主芯片在启动后,串行接口间将共同协商的用于主芯片通过缓存芯片访问内存芯片的速率。
507、系统控制主芯片根据第二目标速率初始化第一PLL的时钟频率和第二PLL的时钟频率,使第一PLL的时钟频率和第二PLL的时钟频率为与第二目标速率对应的时钟频率。
当主芯片在得到第二目标速率时,逻辑控制器1可根据第二目标速率重新初始化PLL1,逻辑控制器2可根据第二目标速率重新初始化PLL2,即重新确定PLL1和PLL2的时钟频率。
例如第二目标速率为32GT/s,即DIMM接口支持的最高频率为32GT/s,假设串行接口的传输速率和PLL的时钟频率的关系是4倍的关系,逻辑控制器1可确定PLL1 被重新初始化后的时钟频率为8000MHz,逻辑控制器2可确定PLL2被重新初始化后的时钟频率为8000MHz。
508、系统控制主芯片的串行接口和缓存芯片的串行接口以第二目标速率进行第二次速率协商。
根据步骤507的举例,在重新初始化PLL1和PLL2的时钟频率为8000MHz时,串行接口1的传输速率和串行接口1的传输速率均为第二目标速率32GT/s。此时,逻辑控制器1可控制串行接口1和串行接口2进行第二次速率协商。
例如逻辑控制器1控制串行接口1以32GT/s的传输速率向串行接口2发送报文,串行接口2以32GT/s的传输速率接收该报文,如果接收该报文成功,串行接口2向串行接口1发送协商成功的响应报文。
这样一来,主芯片和缓存芯片中的SerDes,和缓存芯片与内存芯片间的DRAM将工作在统一的时钟频率(频点)上,且在PLL1和PLL2为同源时钟,且clk_ddr和clk_sds2也为同源时钟的基础上,主芯片在通过缓存芯片访问内存芯片时,将避免跨异步时钟域,减少数据传输的延时。
在一些实施例中,主芯片的串行接口和缓存芯片的串行接口进行速率协商时,支持任意指定的整数速率进行协商,并不局限于支持少数固定速率的协商。例如不局限于PCIE支持的GT/s和16GT/s等。
在一些实施例中,经过分析,DARM的工作频点(时钟频率)和SerDes的工作频点可统一。以DDR5为例,DDR5的传输参数和SerDes的传输参数可参见表1。
表1
表1中,DDR TYPE表示DRAM的类型为DDR5,DDR Data Rate表示DDR5的传输数据速率,单位为MT/s,DDR PHY CLK表示DDR5的DIMM接口的时钟频率,DDRC CLK表示DDR5的DDR控制器的时钟频率,SerDes data CLK表示SerDes传输数据的时钟频率。可看出,DDR PHY CLK的时钟频率是SerDes data CLK的时钟频率2倍。假设第一次速率协商的第一目标速率为SerDes data CLK中的2100MT/s实现低速启动后,可通过第二次速率协商得到DDR PHY CLK的时钟频率,即得到第二目标速率例如可为4200MT/s。即本申请可将SerDes data CLK的时钟频率统一为DDR PHY CLK的时钟频率4200MT/s,实现SerDes和DRAM的时钟频率统一。
509、系统控制主芯片、缓存芯片和内存芯片间进行业务传输。
通过本申请,在主芯片和缓存芯片使用公共的参考时钟REF_CLK,缓存芯片的DIMM接口的时钟域缓存芯片的串行接口SerDes的时钟同源的基础上,主芯片和缓存芯片最终选取的最高速率,即第二目标速率与缓存芯片上插入DIMM接口的内存芯片支持的最高速率相关。当DIMM接口的内存芯片支持的最高速率不同时,串行接口SerDes的传输速率也会同步发生变化,也就相当于将SerDes和DRAM的时钟频率进行了统一。当主芯片在通过缓存芯片访问内存芯片时,即数据流在跨多个芯片时,数据流在同步时钟域中传输,无异步处理延时消耗,可减少数据传输延时。经过分析,在精细设计下,可将整个系统额外增加的延时控制在10ns。
需要说明的是,本申请的速率统一过程,或者说频率统一过程发生在系统上电的启动过程中,但不限于只在启动过程中执行,也可以在系统进行正常的通信流程中,通过抓包执行频率统一的流程。例如主芯片与缓存芯片在数据包传输过程中进行第一次速率协商和第二次速率协商,完成速率统一。
此外,在使用SerDes扩展其他组件,不限于扩展内存的方案中,也可以使用本申请的方法流程实现频率统一,减少延时。
可以理解的是,为了实现上述功能,通过串行接口扩展内存的系统包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本实施例可以根据上述方法示例对通过串行接口扩展内存的系统进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块可以采用硬件的形式实现。需要说明的是,本实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图7示出了上述实施例中涉及的通过串行接口扩展内存的系统70的一种可能的组成示意图,如图7所示,该通过串行接口扩展内存的系统70可以包括:速率协商单元701、速率获取单元702和初始化单元703。
其中,速率协商单元701可以用于支持通过串行接口扩展内存的系统70执行上述步骤401、步骤403、步骤503、步骤508、步骤509等,和/或用于本文所描述的技术的其他过程。
速率获取单元702可以用于支持通过串行接口扩展内存的系统70执行上述步骤402、步骤505、步骤506等,和/或用于本文所描述的技术的其他过程。
初始化单元703可以用于支持通过串行接口扩展内存的系统70执行上述步骤501、步骤502、步骤504、步骤507等,和/或用于本文所描述的技术的其他过程。
需要说明的是,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
本实施例提供的通过串行接口扩展内存的系统70,用于执行上述统一时钟频率的方法,因此可以达到与上述实现方法相同的效果。
在采用集成的单元的情况下,通过串行接口扩展内存的系统70还可以如图8所示的通过串行接口扩展内存的系统80,包括处理模块和存储模块。其中,处理模块可以用于对通过串行接口扩展内存的系统70的动作进行控制管理,例如,可以用于支持通过串行接口扩展内存的系统70执行上述速率协商单元701、速率获取单元702和初始化单元703执行的步骤。存储模块可以用于支持通过串行接口扩展内存的系统70存储程序代码和数据等。
其中,处理模块可以是处理器或控制器。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理(digital signal processing,DSP)和微处理器的组合等等。存储模块可以是存储器。
在一个实施例中,当处理模块为处理器,存储模块为存储器时,处理模块可以是本申请主芯片中的处理器/控制器/控制电路,和/或缓存芯片中的处理器/控制器/控制电路。存储模块可以是主芯片中的存储器和/或缓存芯片中的存储器。本实施例所涉及的通过串行接口扩展内存的系统可以为具有图2所示结构的系统。
本申请实施例还提供一种电子设备,包括一个或多个处理器以及一个或多个存储器。该一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行上述相关方法步骤实现上述实施例中的统一时钟频率方法。
本申请的实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的统一时钟频率的方法。
本申请的实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中电子设备执行的统一时钟频率的方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中电子设备执行的统一时钟频率的方法。
其中,本实施例提供的系统、电子设备、计算机存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
本申请另一实施例提供了一种系统,该系统可以包括上述主芯片、缓存芯片和多个内存芯片,可以用于实现上述统一时钟频率的方法。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种统一时钟频率的方法,其特征在于,应用于通过串行接口扩展内存的系统,所述系统包括主芯片、缓存芯片和与所述缓存芯片耦合的至少一个内存芯片,与所述主芯片中的串行接口耦合的第一锁相环和与所述缓存芯片中的串行接口耦合的第二锁相环的时钟同源,所述缓存芯片中的双列直插式存储模块接口和所述缓存芯片中的串行接口均与所述第二锁相环耦合,所述方法包括:
    所述系统控制所述主芯片和所述缓存芯片完成串行接口初始化后,执行所述主芯片和所述缓存芯片间的第一次速率协商,确定所述主芯片访问所述缓存芯片的第一目标速率;
    所述系统控制所述主芯片以所述第一目标速率对所述缓存芯片进行初始化完成后,确定所述缓存芯片与内存芯片间的双列直插式存储模块接口支持的第二目标速率;
    所述系统将所述主芯片中的所述第一锁相环的时钟频率和所述缓存芯片中的所述第二锁相环的时钟频率均配置为与所述第二目标速率对应的时钟频率,并控制所述主芯片与所述缓存芯片完成第二次速率协商。
  2. 根据权利要求1所述的方法,其特征在于,所述控制所述主芯片和所述缓存芯片完成串行接口初始化后,执行所述主芯片和所述缓存芯片间的第一次速率协商,以确定所述主芯片访问所述缓存芯片的第一目标速率包括:
    控制所述主芯片中的所述第一锁相环和所述缓存芯片中的所述第二锁相环根据同源的参考时钟进行倍频处理,得到所述第一锁相环工作的时钟频率和所述第二锁相环工作的时钟频率,所述第一锁相环工作的时钟频率和所述第二锁相环工作的时钟频率相同;
    根据所述第一锁相环工作的时钟频率确定所述主芯片中的串行接口的传输速率,以及根据所述第二锁相环工作的时钟频率确定所述缓存芯片中的串行接口的传输速率;
    根据所述主芯片的串行接口的传输速率和所述缓存芯片中的串行接口的传输速率执行所述主芯片和所述缓存芯片间的所述第一次速率协商,并在协商成功时确定所述主芯片中的串行接口的传输速率和所述缓存芯片中的串行接口的传输速率为所述主芯片访问所述缓存芯片的所述第一目标速率。
  3. 根据权利要求1或2所述的方法,其特征在于,所述控制所述主芯片以所述第一目标速率对所述缓存芯片进行初始化完成后,确定所述缓存芯片与内存芯片间的双列直插式存储模块接口支持的第二目标速率包括:
    控制所述主芯片的串行接口以所述第一目标速率访问所述缓存芯片的寄存器,以对所述缓存芯片的寄存器进行初始化;
    控制所述缓存芯片通过所述双列直插式存储模块接口访问与所述缓存芯片对接的内存芯片,确定所述双列直插式存储模块接口支持的最高速率,将所述双列直插式存储模块接口支持的最高频率作为所述第二目标速率;
    控制所述缓存芯片将所述第二目标速率记录在所述缓存芯片的寄存器中。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述主芯片中的所述第一锁相环的时钟频率和所述缓存芯片中的所述第二锁相环的时钟频率均配置为与所述第二目标速率对应的时钟频率,并控制所述主芯片与所述缓存芯片完成第二次速率协商包 括:
    控制所述主芯片从所述缓存芯片的寄存器中读取所述第二目标速率;
    控制所述主芯片根据所述第二目标速率初始化所述第一锁相环的时钟频率和所述第二锁相环的时钟频率,使所述第一锁相环的时钟频率和所述第二锁相环的时钟频率为与所述第二目标速率对应的时钟频率;
    控制所述主芯片的串行接口和所述缓存芯片的串行接口以所述第二目标速率进行所述第二次速率协商。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述主芯片的串行接口和所述缓存芯片的串行接口进行速率协商时,支持任意指定的整数速率。
  6. 一种通过串行接口扩展内存的系统,所述系统包括主芯片、缓存芯片和与所述缓存芯片耦合的至少一个内存芯片,其特征在于,与所述主芯片中的串行接口耦合的第一锁相环和与所述缓存芯片中的串行接口耦合的第二锁相环的时钟同源,所述缓存芯片中的双列直插式存储模块接口和所述缓存芯片中的串行接口均与所述第二锁相环耦合,其中:
    所述主芯片,用于和所述缓存芯片完成串行接口初始化后,和所述缓存芯片完成第一次速率协商,确定所述主芯片访问所述缓存芯片的第一目标速率;
    以所述第一目标速率对所述缓存芯片进行初始化完成后,确定所述缓存芯片与内存芯片间的双列直插式存储模块接口支持的第二目标速率;
    将所述主芯片中的所述第一锁相环的时钟频率配置为与所述第二目标速率对应的时钟频率;
    所述缓存芯片,用于将所述缓存芯片中的所述第二锁相环的时钟频率均配置为与所述第二目标速率对应的时钟频率;
    所述主芯片,还用于与所述缓存芯片完成第二次速率协商。
  7. 根据权利要求6所述的系统,其特征在于,
    所述主芯片,用于控制所述第一锁相环根据同源的参考时钟进行倍频处理,得到所述第一锁相环工作的时钟频率;
    所述缓存芯片,用于控制所述第二锁相环根据同源的参考时钟进行倍频处理,得到所述第二锁相环工作的时钟频率,所述第一锁相环工作的时钟频率和所述第二锁相环工作的时钟频率相同;
    所述主芯片,用于根据所述第一锁相环工作的时钟频率确定所述主芯片中的串行接口的传输速率;
    所述缓存芯片,用于根据所述第二锁相环工作的时钟频率确定所述缓存芯片中的串行接口的传输速率;
    所述主芯片,用于根据所述主芯片的串行接口的传输速率和所述缓存芯片中的串行接口的传输速率执行所述主芯片和所述缓存芯片间的所述第一次速率协商;
    在协商成功时确定所述主芯片中的串行接口的传输速率为所述主芯片访问所述缓存芯片的所述第一目标速率;
    所述缓存芯片,用于在协商成功时确定所述缓存芯片中的串行接口的传输速率为所述主芯片访问所述缓存芯片的所述第一目标速率。
  8. 根据权利要求6或7所述的系统,其特征在于,
    所述主芯片,用于控制所述主芯片的串行接口以所述第一目标速率访问所述缓存芯片的寄存器,以对所述缓存芯片的寄存器进行初始化;
    所述缓存芯片,用于通过所述双列直插式存储模块接口访问与所述缓存芯片对接的内存芯片,确定所述双列直插式存储模块接口支持的最高速率,将所述双列直插式存储模块接口支持的最高频率作为所述第二目标速率;
    将所述第二目标速率记录在所述缓存芯片的寄存器中。
  9. 根据权利要求8所述的系统,其特征在于,
    所述主芯片,用于从所述缓存芯片的寄存器中读取所述第二目标速率;
    根据所述第二目标速率初始化所述第一锁相环的时钟频率和所述第二锁相环的时钟频率,使所述第一锁相环的时钟频率和所述第二锁相环的时钟频率为与所述第二目标速率对应的时钟频率;
    控制所述主芯片的串行接口和所述缓存芯片的串行接口以所述第二目标速率进行所述第二次速率协商。
  10. 根据权利要求6-9任一项所述的系统,其特征在于,所述主芯片的串行接口和所述缓存芯片的串行接口进行速率协商时,支持任意指定的整数速率。
  11. 一种频率控制装置,其特征在于,所述频率控制装置应用于通过串行接口扩展内存的系统,所述系统包括主芯片、缓存芯片和与所述缓存芯片耦合的至少一个内存芯片,与所述主芯片中的串行接口耦合的第一锁相环和与所述缓存芯片中的串行接口耦合的第二锁相环的时钟同源,所述缓存芯片中的双列直插式存储模块接口和所述缓存芯片中的串行接口均与所述第二锁相环耦合,所述频率控制装置包括:
    速率协商单元,用于控制所述主芯片和所述缓存芯片完成串行接口初始化后,控制所述主芯片和所述缓存芯片执行所述主芯片和所述缓存芯片间的第一次速率协商,以确定所述主芯片访问所述缓存芯片的第一目标速率;
    速率获取单元,用于控制所述主芯片以所述第一目标速率对所述缓存芯片进行初始化完成后,确定所述缓存芯片与内存芯片间的接口支持的第二目标速率;
    所述速率协商单元,还用于将所述主芯片中的所述第一锁相环的时钟频率和所述缓存芯片中的所述第二锁相环的时钟频率均配置为与所述第二目标速率对应的时钟频率,并控制所述主芯片与所述缓存芯片完成第二次速率协商。
  12. 根据权利要求11所述的频率控制装置,其特征在于,所述速率协商单元用于:
    控制所述主芯片中的所述第一锁相环和所述缓存芯片中的所述第二锁相环根据同源的参考时钟进行倍频处理,得到所述第一锁相环工作的时钟频率和所述第二锁相环工作的时钟频率,所述第一锁相环工作的时钟频率和所述第二锁相环工作的时钟频率相同;
    根据所述第一锁相环工作的时钟频率确定所述主芯片中的串行接口的传输速率,以及根据所述第二锁相环工作的时钟频率确定所述缓存芯片中的串行接口的传输速率;
    根据所述主芯片的串行接口的传输速率和所述缓存芯片中的串行接口的传输速率执行所述主芯片和所述缓存芯片间的所述第一次速率协商,并在协商成功时确定所述主芯片中的串行接口的传输速率和所述缓存芯片中的串行接口的传输速率为所述主芯 片访问所述缓存芯片的所述第一目标速率。
  13. 根据权利要求11或12所述的频率控制装置,其特征在于,所述速率获取单元用于:
    控制所述主芯片的串行接口以所述第一目标速率访问所述缓存芯片的寄存器,以对所述缓存芯片的寄存器进行初始化;
    控制所述缓存芯片通过所述双列直插式存储模块接口访问与所述缓存芯片对接的内存芯片,确定所述双列直插式存储模块接口支持的最高速率,将所述双列直插式存储模块接口支持的最高频率作为所述第二目标速率;
    控制所述缓存芯片将所述第二目标速率记录在所述缓存芯片的寄存器中。
  14. 根据权利要求13所述的频率控制装置,其特征在于,所述速率协商单元用于:
    控制所述主芯片从所述缓存芯片的寄存器中读取所述第二目标速率;
    控制所述主芯片根据所述第二目标速率初始化所述第一锁相环的时钟频率和所述第二锁相环的时钟频率,使所述第一锁相环的时钟频率和所述第二锁相环的时钟频率为与所述第二目标速率对应的时钟频率;
    控制所述主芯片的串行接口和所述缓存芯片的串行接口以所述第二目标速率进行所述第二次速率协商。
  15. 根据权利要求11-14任一项所述的频率控制装置,其特征在于,所述主芯片的串行接口和所述缓存芯片的串行接口进行速率协商时,支持任意指定的整数速率。
  16. 一种计算机可读存储介质,其特征在于,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述权利要求1-5中的任一项所述的方法。
PCT/CN2023/091264 2022-05-06 2023-04-27 一种统一时钟频率的方法和装置 WO2023213224A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210488354.XA CN117056253A (zh) 2022-05-06 2022-05-06 一种统一时钟频率的方法和装置
CN202210488354.X 2022-05-06

Publications (1)

Publication Number Publication Date
WO2023213224A1 true WO2023213224A1 (zh) 2023-11-09

Family

ID=88646253

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/091264 WO2023213224A1 (zh) 2022-05-06 2023-04-27 一种统一时钟频率的方法和装置

Country Status (2)

Country Link
CN (1) CN117056253A (zh)
WO (1) WO2023213224A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150006806A1 (en) * 2013-06-26 2015-01-01 Electronics And Telecommunications Research Institute Double data rate synchronous dynamic random access memory module and configuring method thereof
US20180113628A1 (en) * 2016-10-21 2018-04-26 Advanced Micro Devices, Inc. Hybrid memory module bridge network and buffers
US20180225235A1 (en) * 2017-02-03 2018-08-09 Futurewei Technologies, Inc. Systems and methods for utilizing ddr4-dram chips in hybrid ddr5-dimms and for cascading ddr5-dimms
CN109639403A (zh) * 2018-11-26 2019-04-16 西南电子技术研究所(中国电子科技集团公司第十研究所) 同步传输数字阵列天线基带激励数据的方法
CN113406993A (zh) * 2021-07-16 2021-09-17 盛立安元科技(杭州)股份有限公司 基于恢复时钟的fpga芯片时钟域同步方法及相关设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150006806A1 (en) * 2013-06-26 2015-01-01 Electronics And Telecommunications Research Institute Double data rate synchronous dynamic random access memory module and configuring method thereof
US20180113628A1 (en) * 2016-10-21 2018-04-26 Advanced Micro Devices, Inc. Hybrid memory module bridge network and buffers
US20180225235A1 (en) * 2017-02-03 2018-08-09 Futurewei Technologies, Inc. Systems and methods for utilizing ddr4-dram chips in hybrid ddr5-dimms and for cascading ddr5-dimms
CN109639403A (zh) * 2018-11-26 2019-04-16 西南电子技术研究所(中国电子科技集团公司第十研究所) 同步传输数字阵列天线基带激励数据的方法
CN113406993A (zh) * 2021-07-16 2021-09-17 盛立安元科技(杭州)股份有限公司 基于恢复时钟的fpga芯片时钟域同步方法及相关设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HOLLIS TIMOTHY M.; STAVE ERIC; OVARD DAVE; GREEFF ROY; SPIRKL WORFGANG; BROX MARTIN; TAYLOR JENNIFER; BUTTERFIELD JUSTIN: "Recent Evolution in the DRAM Interface: Mile-Markers Along Memory Lane", IEEE SOLID-STATE CIRCUITS MAGAZINE, IEEE, USA, vol. 11, no. 2, 1 April 2019 (2019-04-01), USA , pages 14 - 30, XP011730603, ISSN: 1943-0582, DOI: 10.1109/MSSC.2019.2910617 *

Also Published As

Publication number Publication date
CN117056253A (zh) 2023-11-14

Similar Documents

Publication Publication Date Title
US10901485B2 (en) Clock-forwarding memory controller with mesochronously-clocked signaling interface
US7529273B2 (en) Method and system for synchronizing communications links in a hub-based memory system
WO2023213224A1 (zh) 一种统一时钟频率的方法和装置
US12019577B2 (en) Latency reduction for link speed switching in multiple lane data links
US9170768B2 (en) Managing fast to slow links in a bus fabric
US11934335B2 (en) Power management for peripheral component interconnect
US20220358061A1 (en) Unmatched architecture compensation via digital component delay
TW202429293A (zh) 用於多通道資料鏈路中的故障通道恢復的資料速率增加
JP2000330927A (ja) 高速処理装置
WO2024076797A1 (en) Power management for peripheral component interconnect
KR20110042940A (ko) Mac-phy인터페이스와 그의 동작 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23799210

Country of ref document: EP

Kind code of ref document: A1