CN116756065A

CN116756065A - On-chip execution pre-reading circuit based on serial peripheral interface

Info

Publication number: CN116756065A
Application number: CN202310693485.6A
Authority: CN
Inventors: 谢士宁
Original assignee: Shenzhen Injoinic Technology Co Ltd
Current assignee: Shenzhen Injoinic Technology Co Ltd
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2023-09-15

Abstract

The embodiment of the application provides an on-chip execution pre-reading circuit based on a serial peripheral interface, which comprises an SPI register module, an SPI interaction module and an SPI working module; in the pre-reading enabling starting state, the SPI interaction module is used for identifying burst reading operation on the bus, generating a pre-reading request and sending the pre-reading request to the SPI working module; the SPI working module is used for communicating with the external Flash memory according to a single reading request, sending the read data with the data length of 1 word sent by the external Flash memory to the SPI interaction module, or communicating with the external Flash memory according to a pre-reading request, and sending the read data with the data length of N words sent by the external Flash memory to the SPI interaction module, wherein N is the burst length supported by any bus protocol. By adopting the embodiment of the application, the time consumed by synchronous processing can be reduced, and the data transmission efficiency can be improved.

Description

On-chip execution pre-reading circuit based on serial peripheral interface

Technical Field

The present application relates to the field of electronic circuit design, and in particular, to an on-chip execution pre-reading circuit based on a serial peripheral interface.

Background

In order to reduce power consumption in a system-on-chip design, a high-speed clock is generally allocated to the SPI interface module exclusively for communication with the external memory, and the CPU only needs to operate at a normal clock frequency, which reduces overall power consumption.

The signal transmission between asynchronous clock domains needs a process of synchronous processing, such as transmitting the read request of the system clock domain to the SPI working clock domain, and a process of pulse synchronous processing is needed. And the read data is returned to the system clock domain in the SPI working clock domain, and a pulse synchronization process is also required for the read data READY indication signal, so that the working efficiency of the SPI under the condition of reading registers with continuous addresses is reduced.

Particularly, when a CPU accesses a Cache frequently and a large amount of data needs to be read onto the Cache, the data with a Cache line length is generally read in a burst read mode each time. In the burst read mode, the read register addresses are continuous, and if the CPU accesses the request once and performs a data interaction process once according to the processing mode, a great amount of time is wasted on the synchronous processing, and the working efficiency of the CPU is reduced.

Disclosure of Invention

The embodiment of the application provides an on-chip execution pre-reading circuit based on a serial peripheral interface, which can reduce the time consumed by synchronous processing and improve the data transmission efficiency.

Specifically, the circuit comprises an SPI register module, an SPI interaction module and an SPI working module;

the SPI register module is used for configuring the working mode of an SPI, and the working mode of the SPI configured by the SPI register module comprises a pre-reading enabling on state and a pre-reading enabling off state;

in the pre-reading enabling starting state, the SPI interaction module is used for identifying burst reading operation on a bus, generating a pre-reading request and sending the pre-reading request to the SPI working module;

in the pre-reading enabled off state, the SPI interaction module is used for analyzing the read operation on the bus into a single read request and sending the single read request to the SPI working module;

the SPI working module is used for communicating with an external Flash memory according to the single read request, and sending read data with the data length of 1 word sent by the external Flash memory to the SPI interaction module, or is used for communicating with the external Flash memory according to the pre-read request, and sending read data with the data length of N word sent by the external Flash memory to the SPI interaction module, wherein N is the burst length supported by any bus protocol.

Preferably, the circuit includes a clock management unit configured to provide two master clock signals, where the two master clock signals include a slow clock signal and a high clock signal, the slow clock signal is configured to provide a slow clock signal for a slow clock domain, the high clock signal is configured to provide a high clock signal for a high clock domain, the slow clock domain includes an SPI register module and an SPI interaction module, and the high clock domain includes an SPI working module.

Preferably, the SPI interaction module is configured to parse a read operation on a bus and parse a read address on an SPI register module, send the read operation and the read address to the high-speed clock domain, and return received read data sent by the high-speed clock domain to the bus.

Preferably, the SPI working mode configured by the SPI register module further comprises SPI mode0/mode3 mode selection, dual SPI/Qual SPI/standard SPI interface mode selection, waiting for continuous address reading request time in a WAIT state and the delay chain size of the pin delay compensation circuit; the SPI register module is also used for configuring relevant information of the command sequence to be sent of the SPI, wherein the relevant information comprises command sequence communication read instructions, a sending DUMMY clock number and a sending continuous read mode bit M7-0.

Preferably, an address matching processing unit is arranged in the SPI interaction module, the address matching processing unit is used for ensuring that read data returned to the bus corresponds to read addresses one by one, and in a pre-reading enabling starting state, the address matching processing unit is used for judging whether the current pre-reading working state works normally or not according to whether the read addresses of the bus are continuous or not.

Preferably, a timeout forced release bus mechanism is arranged in the SPI interaction module, and the timeout forced release bus mechanism is used for preventing the SPI interaction module from occupying the bus for more than a preset period of time.

Preferably, the communication rate of the SPI working module is consistent with the frequency of the working clock in the SPI working module; the communication clock SCK generated by the SPI working module is obtained by controlling the output of the internal working clock according to a clock gating mode.

Preferably, the SPI operating module is provided with a pin delay compensation circuit, and the pin delay compensation circuit is used for compensating delay of sampling the received data clock and the SCK signal of the pin.

Preferably, the SPI operation module is configured to generate an SPI operation sequence according to a preset finite state machine, where the finite state machine includes 8 operation states, and the operation states include: IDLE state, START state, CMD transmission state, ADDR transmission state, addr_m transmission state, DUMMY transmission state, RXDAT transmission state, WAIT transmission state.

Preferably, the SPI working module controls the state machine to jump to different working states according to the read request type.

Preferably, the received data buffer unit in the SPI operation module uses an asynchronous FIFO with depth of 3 layers and width of 8 bits; in the pre-reading working state, an asynchronous FIFO with the depth of 2 layers and the width of 32 bits is adopted for caching pre-acquired data.

Compared with the prior art, the embodiment of the application reads the data of a plurality of words at one time by setting the pre-reading working state, thereby reducing the time for the CPU to wait for the data to return; further, the SPI operating module writes one word of data into an asynchronous FIFO having a depth of 2 layers and a width of 32 bits by using the asynchronous FIFO as a buffer unit for prefetching data. The SPI interaction module is used as a read data end of the FIFO, is not empty and reads away, judges whether the addresses of the read data are continuous or not through the preset address matching processing unit, if so, determines to send the read data to the bus, and if not, does not send the read data to the bus, thereby reducing the time for the CPU to wait for data return. In the pre-reading working state, when the read requests with discontinuous addresses appear on the bus, the interactive module reads out but discards the data in the FIFO and latches the current read request type and the read address. And when the SPI working module exits from the current pre-reading mode, the reading request is resent, so that the correctness of data transmission is ensured.

Furthermore, the SPI communication time sequence is generated by adopting a clock gating mode, so that SPI communication can be performed at the speed of one-time frequency division of the clock of the SPI working module; and under the condition of entering the pre-reading working state, the SPI working module can complete the interaction of a plurality of byte data only by using 2 layers of FIFO resources. Compared with the mode of buffer data caching, a large amount of register resources are saved, the CPU waiting time is reduced, and the data transmission efficiency is improved.

Drawings

The drawings that are used in the description of the embodiments will be briefly described below.

FIG. 1 is a schematic diagram of an on-chip pre-read circuit based on a serial peripheral interface according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an on-chip pre-read circuit based on a serial peripheral interface according to an embodiment of the present application;

FIG. 3 is a flowchart of SPI interaction module read data processing provided by an embodiment of the present application;

FIG. 4 is a flowchart of a data processing when a pre-read operation state is on according to an embodiment of the present application;

FIG. 5 is a timing diagram of a read timeout detection and bus release according to an embodiment of the present application;

FIG. 6 is a schematic diagram of yet another on-chip implementation pre-read circuit based on a serial peripheral interface provided by an embodiment of the present application;

FIG. 7 is a timing diagram of key signals between an SPI interaction module and an SPI working module provided by an embodiment of the present application;

fig. 8 is a schematic structural diagram of an SPI operation module according to an embodiment of the present application;

fig. 9 is a schematic diagram of an SPI timing generation operation state machine according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

XIP (eXecute In Place) refers to a mode of operation In which the CPU reads program code directly from memory, without reading into memory. The in-Flash execution means that the Nor Flash does not need to be initialized and can directly execute codes in the Flash, wherein key steps are instruction taking, and the Flash must meet the requirements of one address and one instruction or data of a CPU. To realize XIP, the CPU needs to have special SPI control, the controller realizes conversion from serial Flash bus to parallel bus, the basic function of the controller needs to be able to realize any address space in the CPU read memory, and the SPI controller can correctly return read data to the CPU for further execution, i.e. one address corresponds to one data. According to the operation characteristics of Flash, in a read command sequence, data of any byte length can be read out by one read command, and the read data are address-continuous. According to this feature, if a read operation with continuous addresses occurs, the SPI controller can read out more than 1 word length data in one command sequence until a read operation with discontinuous addresses occurs, which perfectly matches the working mechanism of the Cache.

The Cache is a Cache memory, the basic principle is that the current access data of a processor is read out from a low-speed memory and then written into the Cache by utilizing the spatial locality and the time locality of a program, and the data of adjacent addresses are read into the Cache from the low-speed memory in advance, so that the processor can directly and quickly read from the Cache when accessing the data and the data of the adjacent addresses again in the future. The SPI module is matched with the Cache, 8 word length data can be read back through a read command sequence, and one Cache line is filled. Under the holding of the Cache, the program execution efficiency in the XIP mode can be greatly improved.

In order to reduce power consumption in a system design, a high-speed clock is generally allocated exclusively to an interface related module for communicating with an external memory, and a CPU only needs to operate at a normal clock frequency, thereby reducing overall power consumption. The signal transmission between asynchronous clock domains needs a process of synchronous processing, such as transmitting the read request of the system clock domain to the SPI working clock domain, and a process of pulse synchronous processing is needed. And the read data is returned to the system clock domain in the SPI working clock domain, and a pulse synchronization process is also required for the read indication signal of the read data, so that the working efficiency of the SPI under the condition of reading registers with continuous addresses is reduced. Particularly, when a CPU accesses a Cache frequently and a large amount of data needs to be read onto the Cache, the data with a Cache line length is generally read in a burst read mode each time. In the burst read mode, the read register addresses are continuous, and if the CPU accesses the request once and performs a data interaction process once according to the processing mode, a great amount of time is wasted on the synchronous processing, and the working efficiency of the CPU is reduced.

In view of the above, the present embodiment provides an on-chip pre-reading circuit based on a serial peripheral interface, please refer to fig. 1, fig. 1 is a schematic structural diagram of the on-chip pre-reading circuit based on the serial peripheral interface, which includes an SPI register module, an SPI interaction module, and an SPI working module.

The SPI register module is used for configuring an SPI working mode, the SPI working mode configured by the SPI register module comprises a pre-reading enabling on state and a pre-reading enabling off state, and it can be understood that the SPI working mode configured by the SPI register module also comprises other related configurations;

in the pre-reading enabling starting state, the SPI interaction module is used for identifying burst reading operation on a bus, generating a pre-reading request and sending the pre-reading request to the SPI working module; it should be noted that, the pre-reading enabling on state and the pre-reading working state are different, the pre-reading enabling on state does not represent that the relevant module of the SPI enters the pre-reading working state, and the pre-reading enabling on and off are configured by the bus through the SPI register module; if the bus configuration pre-reading enabling is started, the SPI interaction module enters a pre-reading working state according to a burst reading request on the bus; if the pre-reading enabling is closed, the SPI interaction module does not enter the pre-reading working state even if a burst reading request exists on the bus, so that the SPI interaction module can enter the pre-reading working state only after the bus is opened by configuring the pre-reading enabling through the SPI register module, and the SPI interaction module receives the burst reading request from the bus.

The SPI register module in fig. 1 is configured by a bus and can be accessed and read by the bus.

Optionally, in the pre-reading operation state, the SPI operation module uses an asynchronous FIFO with a depth of 2 layers and a width of 32 bits as a buffer unit for pre-reading data, and writes the data of one word into the asynchronous FIFO every time the SPI operation module reads back the data of one word. The SPI interaction module is used as a read data end of the FIFO, reads away when the read data end is not empty, judges whether the addresses of the read data are continuous or not through a preset address matching processing unit, if so, determines to send the read data to the bus, and if not, does not send the read data to the bus, thereby reducing the time for the CPU to wait for data to return;

optionally, since the data of the external memory is processed by an asynchronous signal for the SPI operation module, the buffer unit for receiving the data in the SPI operation module uses an asynchronous FIFO with a depth of 3 layers and a width of 8 bits, and the asynchronous FIFO is used as a synchronous unit for the SPI operation module to read the data of the external memory, and can synchronize the asynchronous data read from the memory by the SPI operation module.

Referring to fig. 2, fig. 2 is a schematic diagram showing an on-chip pre-reading circuit based on a serial peripheral interface according to another embodiment of the present application; the SPI interaction module in FIG. 2 comprises a bus communication processing unit, a read request analysis unit, an address matching processing unit, a pre-read mode data processing unit, a pre-read address exception processing unit and an under-exception bus release processing unit.

The bus communication processing unit is used for being responsible for communication with a bus and comprises receiving a read instruction and a read address of the bus and controlling a read data return time sequence, and all units in the SPI interaction module can transmit data information with the bus through the bus communication processing unit;

the read request analysis unit is used for identifying the read operation type of the bus and the read address on the register, sending a single/pre-read request to the SPI working module, and returning the read data sent by the SPI working module to the bus;

in the pre-reading working state, after the SPI interaction module recognizes a reading request of a bus, the current reading address is latched, the address matching processing unit compares the current bus reading address with a previous reading address value according to a word alignment mode, and if the address position of a read register which is analyzed currently is the next address of the previous reading operation, reading data in an asynchronous FIFO with the depth of 2 layers and the width of 32 bits is read and returned to the bus. If the bus read address is detected to be discontinuous, the pre-read abnormal processing state is entered, and the current bus read operation type and the read address are latched. In the pre-read exception handling state, the read data in the asynchronous FIFO is still read until the data in the asynchronous FIFO is completely fetched.

In a non-pre-reading working state, the SPI interaction module latches a current read address after recognizing a read request of a bus, the address matching processing unit compares the current bus read address with a previous read address value according to a word alignment mode, if the address position of a read register which is currently analyzed is the next address of a previous read operation, a single/pre-reading request with continuous addresses is generated and sent to the SPI working module, and the SPI working module can process according to the state of a current state machine after receiving the continuous single/pre-reading request. It should be noted that the non-read-ahead operating state may be a read-ahead enabled off state.

The SPI interaction module is internally provided with an abnormal lower bus release processing unit, the abnormal lower bus release processing unit counts the number of 32ms pulses frequency-divided by the clock management unit in the system, and when the situation that the HREADY time of the SPI module pulling down the output bus exceeds 32ms is detected, the SPI module is pulled up to output the HREADY release bus, so that the SPI module is prevented from occupying the bus all the time under the abnormal condition, and the system is locked and cannot operate.

In an alternative embodiment, the SPI interaction module only considers a burst read operation with a burst length of 8 on the bus as a pre-read request, i.e., n=8, and the rest read operations are all considered as single read requests; referring to fig. 3, fig. 3 is a flowchart of a process for reading data by using an SPI interaction module according to an embodiment of the present application, where a pre-read request refers to a command sequence of a specific read command sent by an SPI operation module according to a configured interface mode bit after receiving the pre-read request, and 8 word data are read from a memory in one command sequence; a single read request reads 1 word of data to the memory in a single command sequence.

Referring to fig. 4, fig. 4 is a flowchart of data processing when a pre-reading operation state is started, which is provided in an embodiment of the present application, firstly, information sent by a bus is analyzed, and pre-reading enabling start is determined, further, after the pre-reading operation state is entered, a continuous state of a bus read address is detected by an address matching processing unit, and whether the current pre-reading state works normally is determined. If the bus read address is detected to be continuous in the current working state, the HREADY is pulled up in each read FIFO data period, and the acquired read data is sent back to the bus. Since one cache line in this embodiment has a length of 8 words, the SPI operating module is required to send a command sequence with a received data length of 8 words for each read ahead request, and write data into the FIFO once each word is received. In a complete pre-reading working state, 8 times of reading requests are required to be sent to the FIFO, the FIFO is read immediately after being not empty, and each time the reading request is sent, a reading data counter is added with 1, is cleared for 8 times and exits from the pre-reading working state; if the bus read address is detected to be discontinuous, the pre-read abnormal processing state is entered, and the current bus read operation type and the read address are latched.

Reading data in the asynchronous FIFO is still read under the pre-reading abnormal processing state until the 8 words of data written by the SPI working module are completely taken out; the HREADY signal is pulled low all the time during the read because the current read data does not match the bus current read request address. After reading 8 word data, ending the pre-reading working state, simultaneously sending the previously latched bus reading request and the reading address to the SPI working module, and starting a new reading command sequence.

The SPI controller needs to convert serial input data into parallel data during the process of receiving the data, and this serial-parallel conversion process is called data preparation time. During data preparation, the SPI interaction module pulls down the READY signal on the bus;

in order to prevent the SPI interaction module from always receiving complete data under abnormal conditions, so that the bus is always in a waiting state, a mechanism capable of detecting abnormal conditions of work and releasing bus occupation is needed, and a wake-up system continues to work; the processing mechanism needs to judge what condition the bus is abnormally occupied by the SPI controller module, the judgment basis is according to the time of the SPI module for drawing down READY, and if the time exceeds a preset value (preferably 32 ms), the SPI controller is considered to be abnormal in the process of receiving data. In the application, a system clock management unit is utilized to take a 32ms pulse signal obtained by frequency division of an RC crystal oscillator clock as a reference clock, and a counter is added with 1 by counting the pulse signal and detecting a rising edge of the 32ms pulse for 1 time in a READY low period; when the counter is added to 2, the SPI module is considered to occupy the READY time timeout (> 32 ms), at which time the SPI module pulls the READY signal high and clears the counter. The advantage of this design is that resources are saved, and the timeout detection function can be realized by only using a 2-bit counter, please refer to fig. 5, fig. 5 is a timing chart of the timeout detection of the read data and the bus release provided by the embodiment of the application.

Referring to fig. 6, fig. 6 shows still another on-chip pre-read circuit based on a serial peripheral interface according to an embodiment of the present application, where the PLL provides a clock signal. The PLL provides two master clock signals according to the CPU clock management unit configuration, wherein the slow clock signal is provided to the slow clock domain and the high clock signal is provided to the high clock domain.

The slow clock domain includes: SPI register module, SPI interaction module.

The high-speed clock domain includes: and the SPI working module generates an SPI working time sequence by using a finite state machine and is responsible for communication with an external Flash memory, and the received read data is returned to the SPI interaction module.

The circuit in the application is provided with two main clocks by the PLL, wherein a slow clock can be used for a data interaction part, and a high-speed clock is used for the protocol implementation of the SPI interface to realize the transceiving of external Flash data. By the division of the clock domains, most of the controller can work in the low-frequency clock domain; and only the SPI interface part is positioned in a high-frequency circuit, so that the transmission rate is effectively improved on the premise of ensuring the stability. When the CPU does not access the Flash requirement, the PLL can be configured to close the SPI interface clock input, so that the overall power consumption of the system is saved.

According to the embodiment of the application, the data of a plurality of words are read at one time by setting the pre-reading enabling, so that the time for waiting for data return by a CPU is reduced; further, the SPI operating module writes one word of data into an asynchronous FIFO having a depth of 2 layers and a width of 32 bits by using the asynchronous FIFO as a buffer unit for prefetching data. The SPI interaction module is used as a read data end of the FIFO, is not empty and reads away, judges whether the addresses of the read data are continuous or not through the preset address matching processing unit, if so, determines to send the read data to the bus, and if not, does not send the read data to the bus, thereby reducing the time for the CPU to wait for data return. In the pre-reading working state, when the read requests with discontinuous addresses appear on the bus, the interactive module reads out but discards the data in the FIFO and latches the current read request type and the read address. And when the SPI working module exits from the current pre-reading working state, the reading request is resent, so that the correctness of data transmission is ensured. And the communication rate of the SPI working module is consistent with the frequency of the working clock in the SPI working module.

Furthermore, the SPI communication time sequence is generated by adopting a clock gating mode, so that SPI communication can be performed at the speed of one frequency division of the clock of the SPI working module. In the pre-reading working state, only 2 layers of FIFO resources are used in the pre-reading working state, and interaction of a plurality of byte data can be completed. Compared with the mode of buffer data caching, a large amount of register resources are saved, the CPU waiting time is reduced, and the data transmission efficiency is improved.

In an alternative embodiment, the SPI register module is configured to configure the operation mode of the SPI and the command sequence related information to be transmitted. The CPU transmits configuration information to the SPI module by accessing the SPI register, wherein the working modes comprise a mode0/mode3 working mode, a standby/Dual/Quad SPI interface mode, a command sequence DUMMY clock number, a command sequence continuous reading mode bit M7-0 and the like. By configuring the interface mode bit of the SPI control register, a read command to be sent and a read command sequence are also determined. Taking a GD25Q16B Flash memory as an example, the relationship between the SPI control register interface Mode bit configuration value and the SPI Mode and the read command of the command sequence is shown in table 1:

TABLE 1

SPI Mode	Command	Instruction name	Register configuration values
				Standard SPI	0x03H	Read Data Bytes	000
Dual SPI	0x3BH	Dual Output Fast Read	010
				Dual SPI	0xBBH	Dual I/O Fast Read	011
Quad SPI	0x6BH	Quad Output Fast Read	100
				Quad SPI	0xEBH	Quad I/O Fast Read	101

Referring to fig. 7, fig. 7 is a timing chart of key signals between the interaction of the SPI interaction module and the SPI operation module according to the present application. Because the SPI working module clock domain and the interaction module are in an asynchronous relationship, some key signals such as read request pulse, data receiving READY signal and the like need to be transmitted across the clock domain through asynchronous processing. For the analyzed read request pulse, the read request pulse is transmitted from the hclk domain to the spi_clk domain after being processed by the pulse synchronization module, and it is to be noted that the hclk domain is a bus working clock domain, and the spi_clk domain is an SPI working clock domain. Similarly, the data receiving READY indication signal returned to the interaction module by the SPI operation module also needs to be processed by the pulse synchronization module, and is synchronized from the spi_clk clock domain to the hclk clock domain. Referring to fig. 8, fig. 8 is a schematic structural diagram of an SPI operation module provided by the present application.

The SPI working module is an SPI communication time sequence generating module, and a standard SPI interface generally comprises four signal lines which are SCK, MOSI, MISO, CS #, respectively. Four-wire communication mode QSPI interfaces are often adopted in an application scene specially aiming at Flash communication, and compared with a standard SPI interface, the SPI working module provided by the embodiment of the application has two more data signal lines HOLD# and WP#. Wherein, the SCK is a clock line for outputting or receiving SPI transmission data; the MOSI is used as a data output signal line in a transmitting state and can also be used as a data input end in a 2-line or 4-line mode; the MISO is used as a data input signal line in a receiving state and can also be used as a data output end in a 2-line or 4-line mode; the CS# is a chip select signal line that when pulled low indicates that communication is in progress; the hold# and wp# are data lines in the 4-line mode, and may be used as input or output ports for data.

The output clock spi_sck_out rate of the interface of the SPI controller is consistent with the input clock spi_clk of the module, and the implementation mode is that the spi_clk is processed by a clock gating unit, and an output signal generated in the effective period of gating enabling is used as a communication clock output of the SPI controller and Flash. The clock gating cell enable signal is generated according to the operating state of the finite state machine, and when it is active, one spi_sck_out communication clock is output every spi_clk clock period, indicating that data is effectively transmitted/received once. In the transmit data phase, data should be sent out at the falling edge. To generate the output data sequential logic, an inverter is used to generate a reverse clock of the spi_clk_n, and the data in the internal tx_buffer is shifted and sent at the rising edge of the spi_clk_n (the spi_clk is the falling edge at this time). For this reason, the clock gating unit for generating the SPI communication clock SCK needs to consider the difference of SCK in mode0 and mode 3. In mode0, the SCK is in a low state when idle. In order to ensure that the gated clock waveform does not generate burrs, the gating enable needs to be pulled high at the falling edge of the spi_clk; in mode3 mode, the gating enable needs to be pulled high at the rising edge of the spi_clk, which ensures that the SCK clock generated after gating is an ideal waveform without glitches.

The SPI controller receives the Flash return data, and does not directly use a main clock spi_clk of a working module, but uses the Flash return data as a data receiving sampling clock after being subjected to Delay processing of a Delay Cell unit, and mainly considers that a path Delay difference exists between a clock at a Flash pin and a receiving sampling clock in the SPI controller, and the main reason of the difference is Delay of a data line input penetrating through a PAD pin and path Delay of a clock signal from an interface clock generating unit to the PAD pin. The Delay Cell unit internally realizes signal Delay output by inserting an inverter and a buffer at a signal input end, and the Delay time can be selected by register configuration. After the chip is streamed back, the size of Delay Cell is adjusted through a configuration register, the optimal sampling point is found, and the influence caused by the path Delay gap is eliminated. Because the read data to the external Flash is asynchronous transmission process, the read data received by the sampling clock is written into the asynchronous FIFO firstly by adopting the strategy of asynchronous FIFO synchronization, then read out and stored in the read data register, and sent to the SPI interaction module, and meanwhile, read data READY pulse signals are generated.

The SPI working module internally generates SPI working time sequence by utilizing a finite state machine FSM and is used for communicating with an external Flash memory. Referring to fig. 9, fig. 9 is a schematic diagram of an SPI timing generation operating state machine according to an embodiment of the application, wherein the SPI timing generation circuit includes 8 operating states in total: IDLE state, START state, CMD transmission state, ADDR transmission state, addr_m transmission state, DUMMY transmission state, RXDAT transmission state, WAIT transmission state.

When the working module receives the read request, a read command sequence is started. First, the state machine jumps from the IDLE state IDLE to the START state for preparing the SPI transmission clock SCK and pulling down the chip select signal cs#. The CMD transmission status and ADDR transmission status indicate that the currently transmitted SPI timing is instruction and register address information. Further, according to the interface mode bit of the SPI control register, the corresponding transmitting state is entered.

The corresponding sending states under different configuration values of the interface mode bit are as follows:

000: the currently configured read command is 0x03h and the fsm enters the receive data state RXDAT after the address data is sent.

011: the currently configured read command is 0xBBH, the FSM enters an ADDR_M state, sends a continuous read mode bit M7-0, and then enters RXDAT;

101: the configured read command is 0xEBH, the FSM enters an ADDR_M state, enters a DUMMY state after the continuous read mode bit M7-0 is sent, and finally enters an RXDAT state after a null clock is sent;

101/100: the configured read command is 0x3BH/6BH, FSM goes into DUMMY state, then RXDat.

Wherein, the continuous read mode bit M7-0 sent in ADDR_M state is an 8-bit data, configured by the corresponding register; the number of empty clocks transmitted in the DUMMY transmission state is also configured by the SPI control register. When the FSM is executed to these states, the transmission timing is automatically generated according to the configuration of the corresponding registers.

The RXDAT state is a read data period and depending on the type of command sequence sent, the Flash memory will choose to return data in a single/two/four wire SPI data transfer mode. In the non-pre-reading working state, generating a READY signal every time 32 bits of data are received in the RXDT state, returning the received read data to the SPI interaction module, and entering a WAIT state; in the pre-read mode, each time 32 bits of data are received, the data are written into the asynchronous FIFO once, and the data with the length of one cache line are received and enter the WAIT state.

In the WAIT state, the SPI operating module still remains in communication with the Flash memory, but does not send the communication clock SCK. If the address continuous reading request is received in the WAIT state, the current command sequence is continued, the START state is entered to START generating the SPI communication clock SCK, and then the read data is directly skipped to the RXDT state to continue receiving. If the next address continuous read request is waited for overtime in the WAIT state, the IDLE state is entered, and the CS# is pulled up to disconnect the communication with Flash, and the read command sequence is ended.

It should be noted that, the state jump may be performed according to the state jump condition indicated below the serial number in fig. 9, for example, if the address continuous reading request is received in the WAIT state, the current command sequence is continued, the state jump condition corresponding to the received address continuous reading request in the START state and generating the SPI communication clock SCK is "wait_2strt" in serial number 10 in fig. 9, so it can be understood that the state jump condition corresponding to the WAIT state waiting for the next address continuous reading request timeout "in the IDLE state is" wait_2idle "in serial number 11 in fig. 9, if the WAIT state waiting for the next address continuous reading request timeout.

The whole system working process is as follows: the run program sends a "03" command to the default single channel after power-up to indicate that the speed is maintained at 24MHz. After the program jump instruction is executed, the CPU configures a clock management unit, controls the PLL to adjust the SPI module working clock to 96MHz, configures the SPI working mode to be 4 channels, and starts pre-reading enabling. At this time, the logic analyzer can observe that the clock and the data rate on the SPI related PIN are obviously improved, and the program in the serial Flash is operated at 384 Mbps. When the Cache mode is started, under the condition that the prefetching working mode is held, the clock and the data on the SPI pin run continuously. The data takes 84 SPI clock cycles from the sending of the read command to the reading back of 8 words, i.e., the time to fill a cache line takes only 875ns. At this time, the CPU still works at the frequency of 24MHz, the effect of executing the program is close to that of the parallel memory, and the power consumption is greatly reduced.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by a program that instructs related hardware, and the program may be stored in a computer-readable storage medium, and the program may include the above-described embodiment methods when executed. And the aforementioned storage medium includes: various media capable of storing program code, such as ROM, RAM, magnetic or optical disks.

Claims

1. An on-chip execution pre-reading circuit based on a serial peripheral interface is characterized by comprising an SPI register module, an SPI interaction module and an SPI working module;

2. The circuit of claim 1, comprising a clock management unit to provide two master clock signals, wherein the two master clock signals comprise a slow clock signal to provide a slow clock signal for a slow clock domain and a high clock signal to provide a high clock signal for a high clock domain, the slow clock domain comprising an SPI register module and an SPI interaction module, the high clock domain comprising an SPI operation module.

3. The circuit of claim 1, wherein the SPI interaction module is configured to parse a read operation on a bus and parse a read address on an SPI register module, and send the read operation and the read address to the high speed clock domain, and return received read data sent by the high speed clock domain to the bus.

4. The circuit according to claim 1 or 2, wherein the operation modes of the SPI configured by the SPI register module further include SPI mode0/mode3 mode selection, dual SPI/Qual SPI/standard SPI interface mode selection, WAIT for address continuous read request time in WAIT state, delay chain size of pin delay compensation circuit; the SPI register module is also used for configuring relevant information of the command sequence to be sent of the SPI, wherein the relevant information comprises command sequence communication read instructions, a sending DUMMY clock number and a sending continuous read mode bit M7-0.

5. The circuit of claim 1, wherein an address matching processing unit is disposed in the SPI interaction module, the address matching processing unit being configured to ensure that read data returned to the bus corresponds to read addresses one by one, and in a pre-read enabled on state, the address matching processing unit being configured to determine whether a current pre-read operating state is operating normally according to whether read addresses of the bus are consecutive.

6. The circuit of claim 1, wherein a timeout forced release bus mechanism is provided in the SPI interface module, the timeout forced release bus mechanism being configured to prevent the SPI interface module from occupying the bus for more than a predetermined period of time.

7. The circuit of claim 2, wherein the communication rate of the SPI operating module is consistent with an operating clock frequency within itself; the communication clock SCK generated by the SPI working module is obtained by controlling the output of the internal working clock according to a clock gating mode.

8. A circuit according to claim 2, wherein the SPI operating module is provided with a pin delay compensation circuit for compensating the delay of the sampling receive data clock and the SCK signal of the pin.

9. The circuit of claim 1, wherein the SPI operation module is configured to generate an SPI operation sequence according to a preset finite state machine, the finite state machine comprising 8 operation states, the operation states comprising: IDLE state, START state, CMD transmission state, ADDR transmission state, addr_m transmission state, DUMMY transmission state, RXDAT transmission state, WAIT transmission state.

10. The circuit of claim 1, wherein the received data buffer unit in the SPI operation module uses an asynchronous FIFO of depth 3 and width 8 bits; in the pre-reading working state, an asynchronous FIFO with the depth of 2 layers and the width of 32 bits is adopted for caching pre-acquired data.