CN118276816A

CN118276816A - Instruction sampling system and method

Info

Publication number: CN118276816A
Application number: CN202211742245.2A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Anhui Cambricon Information Technology Co Ltd
Current assignee: Anhui Cambricon Information Technology Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2024-07-02

Abstract

The present disclosure provides a system and method for sampling instructions, where the method may be included in a combined processing apparatus that may also include a universal interconnect interface and other processing apparatus. The computing device interacts with other processing devices to jointly complete the computing operation designated by the user. The combined processing means may further comprise storage means connected to the device and the other processing means, respectively, for storing data of the device and the other processing means.

Description

Instruction sampling system and method

Technical Field

The present disclosure relates to the field of chips, and more particularly, to the field of sampling of instructions.

Background

Fig. 5 shows a schematic diagram of a conventional MMC system.

As shown in fig. 5, in the MMC (multimedia card) control system, system protocol communication is generally classified into two types, one is transmission of control instructions and one is transmission of data instructions. Control instructions and data instructions are typically split into two-wire, 4-wire, 8-wire, or 16-wire transmissions, and during transmission MMC controllers are typically split into several modes, such as high speed mode (HIGH SPEED), single data rate SDR mode, double data rate DDR mode, etc. In SDR mode, the corresponding control Command (CMD) and the corresponding DATA command (DATA) are coincident with respect to the timing (timing) of the clock signal (CLK), i.e., DATA is sampled at the rising edge of the clock signal; in the DDR mode, the control instructions and the data instructions are not identical in timing with respect to the clock signal, i.e., the control instructions are sampled at the rising edge of the clock signal, and the data instructions are sampled at the same time as the rising edge and the falling edge of the clock signal. If the control command and data command delays are consistent, this may result in the clock signal not sampling at the data command center location when the control command samples are centered, which may result in a data command sample failure.

The most common practice in the current mainstream MMC control system is to make different signal delays on the control command signal line and the data command signal line respectively. Fig. 6 shows an instruction delay pattern of the prior art.

As shown in fig. 6, a first-in-first-out (FIFO) buffer module receives a control instruction stream and a data instruction stream for early buffering of data instructions and control instructions. The delay module is built by a digital logic circuit and is used for generating a delay time and adjusting a corresponding sampling window. And the driving module is used for driving the external equipment.

For convenience of description, a stream transmitted from a transmitting end (Tx), a receiving end (Rx), and a stream received at the transmitting end (Tx) are referred to as a "forward stream" and a "reverse stream", respectively. Specifically, in fig. 6, as shown in part (a), the forward data instruction stream is buffered in the FIFO buffer module 0 first, and then passes through the sampling switch 0, and the on and off of the sampling switch 0 is controlled by the delay module 0, so as to adjust the delay of the forward data instruction stream; finally, after passing through the sampling switch 0, the forward data instruction stream reaches the driving module 0. As shown in part (b) of fig. 6, the forward control instruction stream is first buffered in the FIFO buffer module 1, and then passes through the sampling switch 1, and the on and off of the sampling switch 1 is controlled by the delay module 1, so as to adjust the delay of the forward control instruction stream; finally, after passing through the sampling switch 1, the forward control instruction stream reaches the driving module 1. As shown in part (c) of fig. 6, the clock signal is buffered in the FIFO buffer module 2 first, and then passes through the sampling switch 2, and the on and off of the sampling switch 2 is controlled by the delay module 2, so as to adjust the delay of the clock signal; finally, after passing through the sampling switch 2, the clock signal reaches the driving module 2. For the reverse flow, as shown in part (d) of fig. 6, the reverse data instruction flow flows from the driving module 3 through the sampling switch 3, the on and off of the sampling switch 3 is controlled by the delay module 3 to adjust the delay size of the reverse data instruction flow, and finally, after passing through the sampling switch 3, the reverse data instruction flow is buffered in the FIFO module 3. As shown in part (e) of fig. 6, the reverse control instruction stream flows from the driving module 4 through the sampling switch 4, the sampling switch 4 is controlled by the delay module 4 to adjust the delay of the reverse control instruction stream, and finally, after passing through the sampling switch 4, the reverse control instruction stream is buffered in the FIFO module 4.

Fig. 7 shows an example of timing between a data instruction, a control instruction, and a clock signal. The data command, the control command and the clock signal all need to be adjusted through delay, so that the data command stream and the control command stream can obtain the maximum window position during sampling, namely, the data command stream and the control command stream need to be sampled at the middle position of the control command stream and the data command stream. That is, in an ideal case, t1=t2 needs to be made, that is, the intermediate position of the corresponding control instruction and data instruction needs to be acquired at the rising edge or the falling edge of the clock signal.

From the above figures, the existing method reduces the coupling among the data instruction, the control instruction and the clock signal to the minimum, and realizes the controllable time sequence among the three to the maximum extent. However, the method also has very high requirements on the design, the three are mutually independent, and the method is not friendly to resource utilization, especially in MMC design, when a control instruction is sent, the data instruction is not transmitted, so that the resource waste is caused, and the complexity of the design is increased.

Disclosure of Invention

It is an object of the present disclosure to adjust the sampling instant of an instruction by adjusting the length of the instruction buffer.

According to a first aspect of the present disclosure, there is provided a system for sampling instructions, comprising: the device comprises an instruction buffer and a driver, wherein the instruction buffer is a first-in first-out (FIFO) buffer and is connected with the driver; the FIFO buffer is configured to receive instructions to or from the driver; the length L of the FIFO buffer is greater than the length n of the instruction and the value of L-n is set to adjust the relative position of the instruction and clock signal leaving the FIFO buffer so that the samples of the instruction are within a sampling window.

According to a second aspect of the present disclosure, there is provided an electronic device comprising a system as described above.

According to a third aspect of the present disclosure, there is provided a method for sampling instructions, comprising: dynamically adjusting the length L of the instruction buffer, wherein the instruction buffer is a first-in first-out FIFO buffer and is used for receiving instructions; the relative position of the instruction and the clock signal leaving the FIFO buffer is adjusted by adjusting the value of L-n such that the samples of the instruction are within a sampling window, wherein the length L of the FIFO buffer is greater than the length n of the instruction.

According to a fourth aspect of the present disclosure, there is provided a controller comprising: one or more processors; and a memory having stored therein computer executable instructions that, when executed by the one or more processors, cause the controller to perform the method as described above.

According to the technical scheme, a delay device in the prior art is not needed, and the sampling of the instruction can be completed only by adjusting the length of the instruction cache device.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

FIG. 1 shows a schematic structural view of a board of an embodiment of the present disclosure;

Fig. 2 is a schematic diagram showing the combination processing apparatus of this embodiment;

FIG. 3 illustrates an internal structural schematic of a computing device;

FIG. 4 illustrates the internal architecture of a processing core;

FIG. 5 shows a schematic diagram of a conventional MMC system;

FIG. 6 illustrates one instruction delay approach of the prior art;

FIG. 7 illustrates an example of timing between data instructions, control instructions, and clock signals;

FIG. 8 illustrates a system for sampling instructions according to one embodiment of the present disclosure;

FIG. 9 illustrates a system for sampling instructions according to one embodiment of the present disclosure;

FIG. 10 illustrates an exemplary format of a data instruction;

FIG. 11 illustrates a system for sampling instructions according to one embodiment of the present disclosure;

FIG. 12 illustrates a system for sampling instructions according to one embodiment of the present disclosure;

FIG. 13 shows an example of a timing diagram according to one embodiment of the present disclosure; and

Fig. 14 illustrates a method for sampling instructions in accordance with an aspect of the present disclosure.

Detailed Description

The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the disclosure. Based on the embodiments in this disclosure, all other embodiments that may be made by those skilled in the art without the inventive effort are within the scope of the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, specification, and drawings of this disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "first," "second," "third," and "fourth," etc. also do not denote a single term, but may denote a plurality of terms. The terms "comprises" and "comprising" when used in the specification and claims of the first present disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Today's semiconductor fabrication process begins with a complete wafer (wafer) consisting of a circular slice of pure silicon, typically divided into 6 inch, 8 inch, 12 inch, etc. specifications, the wafer is cut into individual dice, which are called dice (die). Each die has a chip attached to it and wiring is arranged to perform a specific electrical function. Then the chip is packaged into a particle by taking the crystal grain as a unit, the packaging aims at placing, fixing, sealing, protecting the chip and enhancing the electrothermal performance, and simultaneously, the contact of the chip is connected to the pin of the packaging shell by a wire, so that the chip packaging structure is completed.

The memory is used for temporarily storing operation data required by the system on chip and data exchanged with the external memory. In this embodiment, the memory may be a high-bandwidth memory (High Bandwidth Memory, HBM), which is a high-performance DRAM fabricated based on a 3D stack process, and is suitable for applications requiring high memory bandwidth, such as graphics processors, network switching and forwarding devices (e.g., routers, switches), etc.

A system on chip (SoC) refers to a technology of integrating a complete system on a single chip and grouping all or part of necessary electronic circuits into packets. In this embodiment, the system-on-chip is mounted on a board. Fig. 1 shows a schematic structural diagram of a board 10 according to an embodiment of the present disclosure. As shown in fig. 1, the board 10 includes a combination processing device 101, which is an artificial intelligent computing unit for supporting various deep learning and machine learning algorithms, and meeting the intelligent processing requirements in complex scenarios in the fields of computer vision, voice, natural language processing, data mining, and the like. Particularly, the deep learning technology is largely applied to the cloud intelligent field, and one remarkable characteristic of the cloud intelligent application is that the input data volume is large, and the high requirements on the storage capacity and the computing capacity of the platform are provided.

The combination processing apparatus 101 is connected to an external device 103 via an external interface apparatus 102. The external device 103 is, for example, a server, a computer, a camera, a display, a mouse, a keyboard, a network card, a wifi interface, or the like. The data to be processed may be transferred by the external device 103 to the combination processing apparatus 101 through the external interface apparatus 102. The calculation result of the combination processing means 101 may be transmitted back to the external device 103 via the external interface means 102. The external interface device 102 may have different interface forms, such as PCIe interfaces, etc., according to different application scenarios.

The board 10 also includes an external memory 104 for storing data, which includes one or more memory units 105. The external memory 104 is connected to the control device 106 and the combination processing apparatus 101 via a bus and transmits data. The control device 106 in the board 10 is configured to regulate the state of the combination processing apparatus 101. To this end, in one application scenario, the control device 106 may include a single chip microcomputer (Micro Controller Unit, MCU).

Fig. 2 is a schematic diagram showing the combination processing apparatus 101 of this embodiment. As shown in fig. 2, the combination processing device 101 includes a computing device 201, an interface device 202, a processing device 203, and a DRAM 204. In one application scenario, the computing device 201, the interface device 202, and the processing device 203 are integrated into the aforementioned system-on-chip. In another application scenario, the computing device 201 itself is the aforementioned system-on-chip.

The computing device 201 is configured to perform user-specified operations, primarily implemented as a single-core smart processor or as a multi-core smart processor, to perform deep learning or machine learning computations, which may interact with the processing device 203 through the interface device 202 to collectively accomplish the user-specified operations.

The interface means 202 are used for transmitting data and control instructions between the computing means 201 and the processing means 203. For example, the computing device 201 may obtain input data from the processing device 203 via the interface device 202, writing to a storage device on the chip of the computing device 201. Further, the computing device 201 may obtain control instructions from the processing device 203 via the interface device 202, and write the control instructions into a control cache on the chip of the computing device 201. Alternatively or in addition, the interface device 202 may also read data in the memory device of the computing device 201 and transmit it to the processing device 203.

The processing device 203 is a general purpose processing device that performs basic control including, but not limited to, data handling, starting and/or stopping of the computing device 201, and the like. Depending on the implementation, the processing device 203 may be one or more types of processors, including but not limited to a central processing unit, a graphics processor, or other general purpose and/or special purpose processor, including but not limited to a Digital Signal Processor (DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc., and the number thereof may be determined according to actual needs. As previously mentioned, the computing device 201 of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure only with respect to it. However, when computing device 201 and processing device 203 are considered together, they are considered to form a heterogeneous multi-core structure.

The DRAM 204 is the aforementioned high-bandwidth memory for storing data to be processed, typically 16G or more, and is used for storing data of the computing device 201 and/or the processing device 203.

Fig. 3 shows a schematic diagram of the internal structure of the computing device 201. The computing device 201 is configured to process input data such as computer vision, voice, natural language, and data mining, and the computing device 201 in the figure adopts a multi-core hierarchical structure design, which includes an external memory controller 301, a peripheral communication module 302, an on-chip interconnection module 303, a synchronization module 304, and a plurality of clusters 305.

There may be a plurality of external memory controllers 301, 2 being shown by way of example, for accessing external memory devices, such as DRAM204 in FIG. 2, to read data from or write data to the off-chip in response to an access request issued by the processor core. The peripheral communication module 302 is configured to receive a control signal from the processing device 203 through the interface device 202, and activate the computing device 201 to perform a task. The on-chip interconnect module 303 connects the external memory controller 301, the peripheral communication module 302, and the plurality of clusters 305 for transferring data and control signals between the respective modules. The synchronization module 304 is a global synchronization barrier controller (global barrier controller, GBC) for coordinating the working progress of each cluster to ensure synchronization of information. The plurality of clusters 305 are the computing cores of the computing device 201, 4 being illustratively shown, and as hardware progresses, the computing device 201 of the present disclosure may also include 8, 16, 64, or even more clusters 305. The cluster 305 is used to efficiently execute the deep learning algorithm.

Each cluster 305 includes a plurality of processor cores (IPU cores) 306 and a memory core (MEM core) 307.

The processor cores 306 are illustratively shown as 4 in the figures, and the present disclosure does not limit the number of processor cores 306. The internal architecture is shown in fig. 4. Each processor core 306 includes three major modules: a control module 41, an operation module 42 and a storage module 43.

The control module 41 is used for coordinating and controlling the operation of the operation module 42 and the storage module 43 to complete the task of deep learning, and comprises a fetch unit (instruction fetch unit, IFU) 411 and an instruction decode unit (instruction decode unit, IDU) 412. The instruction fetching unit 411 is configured to fetch an instruction from the processing device 203, and the instruction decoding unit 412 decodes the fetched instruction and sends the decoded result to the operation module 42 and the storage module 43 as control information.

The operation module 42 includes a vector operation unit 421 and a matrix operation unit 422. The vector operation unit 421 is used for performing vector operations and can support complex operations such as vector multiplication, addition, nonlinear transformation, etc.; the matrix operation unit 422 is responsible for the core computation of the deep learning algorithm, i.e. matrix multiplication and convolution.

The storage module 43 is used for storing or handling related data, including a neuron storage unit (NRAM) 431, a weight storage unit (WEIGHT RAM, WRAM) 432, an input/output direct memory access module (input/output direct memory access, IODMA) 433, and a handling direct memory access module (move direct memory access, MVDMA) 434.NRAM 431 is used to store input, output data and intermediate results for computation by processor core 306; WRAM 432 is configured to store weights for the deep learning network; IODMA 433 controls access to NRAM 431/WRAM 432 and DRAM 204 over broadcast bus 309; MVDMA 434 to 434 are used to control access to NRAM 431/WRAM 432 and SRAM 308.

Returning to FIG. 3, the storage cores 307 are primarily used to store and communicate, i.e., to store shared data or intermediate results between the processor cores 306, as well as to perform communications between the clusters 305 and the DRAM 204, between the clusters 305, between the processor cores 306, etc. In other embodiments, the memory core 307 has scalar operation capabilities to perform scalar operations.

The memory core 307 includes a shared memory unit (SRAM) 308, a broadcast bus 309, a clustered direct memory access module (cluster direct memory access, CDMA) 310, and a global direct memory access module (global direct memory access, GDMA) 311. The SRAM 308 plays a role of a high-performance data transfer station, and data multiplexed between different processor cores 306 in the same cluster 305 is not required to be obtained from the processor cores 306 to the DRAM 204 respectively, but is transferred between the processor cores 306 through the SRAM 308, and the memory core 307 only needs to rapidly distribute the multiplexed data from the SRAM 308 to a plurality of processor cores 306, so that the inter-core communication efficiency is improved, and the on-chip off-chip input/output access is greatly reduced.

Broadcast buses 309, CDMA 310, and GDMA are used to perform inter-processor core 306 communications, inter-cluster 305 communications, and cluster 305 and DRAM 204 data transfers, respectively. As will be described below, respectively.

The broadcast bus 309 is used to perform high-speed communication between the processor cores 306 in the cluster 305. The broadcast bus 309 of this embodiment supports inter-core communication modes including unicast, multicast and broadcast. Unicast refers to the transmission of data from point to point (i.e., single processor core to single processor core), multicast is a communication scheme that transfers a piece of data from SRAM 308 to a specific number of processor cores 306, and broadcast is a communication scheme that transfers a piece of data from SRAM 308 to all processor cores 306, a special case of multicast.

CDMA 310 is used to control access to SRAM 308 between different clusters 305 within the same computing device 201. GDMA 311, in conjunction with the external memory controller 301, are used to control access of SRAM 308 of cluster 305 to DRAM 204 or to read data from DRAM 204 into SRAM 308.

Fig. 8 illustrates a system for sampling instructions according to one embodiment of the present disclosure, including: the device comprises an instruction buffer 81 and a driver 83, wherein the instruction buffer 81 is a first-in first-out FIFO buffer and is connected with the driver 83; the FIFO buffer 81 is configured to receive instructions to or from the driver 83; the length L of the FIFO buffer 81 is greater than the length n of the instruction and the value of L-n is set to adjust the relative position of the instruction leaving the FIFO buffer 81 to the clock signal so that the samples of the instruction are within the sampling window.

In a computer, a fifo queue is a conventional sequential execution method, where an instruction that enters first completes and exits, and then a second instruction is executed, for example, an instruction No. 0 enters the queue first, followed by an instruction No. 1, an instruction No. 2, and so on. When the processor finishes the current instruction, the instruction 0 is fetched from the queue for execution in advance, and the instruction 1 takes over the position of the instruction 0, and the instructions 2, 3 and the like move forward by one position.

For the scheme of the present disclosure, assuming that the length of the instruction is n and the length of the FIFO buffer 81 is L, if L > n, the instruction is not output from the FIFO buffer 81 immediately after all of the instructions enter the FIFO buffer 81, but is output after proceeding for a while. This corresponds to a certain delay of the instruction. Compared to the prior art that employs a delay to delay instructions, the scheme of the present disclosure delays instructions by changing the length of the FIFO buffer.

Further, in the above description, "receiving an instruction to the driver 83" means that the FIFO buffer 81 receives an instruction and sends the instruction to the driver 83; and "receiving an instruction from the driver 83" means that the FIFO buffer 81 receives an instruction from the driver 83.

According to the above embodiment, if the delay amount for each instruction is fixed, the length L of the FIFO buffer may be preset, where L is greater than the length n of the instruction, so that the instruction may be delayed by the corresponding delay amount every time it enters the FIFO buffer.

The FIFO buffer may be implemented in hardware or in software, and the implementation of the FIFO is not limited in this disclosure.

Fig. 9 illustrates a system for sampling instructions according to one embodiment of the present disclosure. As shown in fig. 9, according to one embodiment of the present disclosure, the system further includes a controller 85, and the controller 85 is connected to the instruction buffer 81 to dynamically adjust the length L of the instruction buffer 81.

In the embodiment shown in fig. 9, the instruction buffer 81 (or FIFO buffer 81) may be implemented in software, and the length L of the FIFO buffer 81 may be adjusted according to the actual delay requirements, so as to be able to accommodate a wider range of requirements and applications. The length of FIFO buffer 81 may be adjusted by controller 85 to accommodate various delay requirements.

According to one embodiment of the present disclosure, the controller is configured to: the type of the instruction is detected to dynamically adjust the length L of the instruction buffer according to the type of the instruction.

To determine the type of instruction, it may be determined whether the instruction is a forward control instruction, a reverse control instruction, a forward data instruction, or a reverse data instruction by detecting a start bit of the instruction.

Table 1 shows the data format of the forward control instruction.

Bit position

47

46

[45:40]

[39:8]

[7:1]

0

Width (position)

1

6

32

7

1

Value of

‘1’

x

‘1’

Description of the invention

Start position

Transmission bit

Command index

Variable(s)

CRC7

End bit

TABLE 1

Table 2 shows the data format of the reverse control instruction

TABLE 2

FIG. 10 illustrates an exemplary format of a data instruction.

As can be seen from tables 1,2 and 10, each of the forward control command, the reverse control command, the forward data command and the reverse data command has a start bit and an end bit, and by detecting the start bit of each command, the type of the command, i.e., the forward control command, the reverse control command, the forward data command or the reverse data command, can be known. The length L of the instruction buffer may be dynamically adjusted to accommodate the corresponding type of instruction, depending on the type of start bit of the detected instruction. It should be understood that the data formats of tables 1,2 and 10 are merely examples, and not limitations on the data formats, and the start bits of these data may be different to facilitate identification.

It should also be appreciated that the above description has been given by way of example of a forward control instruction, a reverse control instruction, a forward data instruction or a reverse data instruction, but the application is not limited to the above instruction types, but extends to any other desired and suitable type.

Fig. 11 illustrates a system for sampling instructions according to one embodiment of the present disclosure, as shown in fig. 11, the instruction buffer 81 includes: a forward control instruction buffer 811 and a forward data instruction buffer 812; the driver 83 includes a first driver 831 and a second driver 832; wherein the forward control instruction buffer 811 is configured to receive and buffer a forward control instruction stream flowing to the first driver 831; the forward data instruction buffer 812 is configured to receive and buffer a forward data instruction stream flowing to the second driver 832; the controller 85 is coupled to the forward control instruction buffer 811 and the forward data instruction buffer 812.

As shown in fig. 11, the above-described instruction stream may be divided into a forward control instruction stream and a forward data instruction stream, as described above, the forward stream being streamed from the MMC host to the MMC device. Unlike the prior art, the technical solution of the present disclosure may delay the forward control instruction and the forward data instruction by adjusting the lengths of the forward control instruction buffer 811 and the forward data instruction buffer 812, respectively, so that the sampling edge of the clock signal falls in the middle of the output forward control instruction and forward data instruction.

FIG. 12 illustrates a system for sampling instructions according to one embodiment of the present disclosure, the instruction buffer 81 further including a reverse control instruction buffer 813 and a reverse data instruction buffer 814, as shown in FIG. 12; the driver 83 further includes a third driver 833 and a fourth driver 834, wherein the reverse control instruction buffer 813 is configured to receive and buffer a reverse control instruction stream from the third driver 833; the reverse data instruction buffer 814 is configured to receive and buffer a reverse data instruction stream from the fourth driver 834; the controller 85 is connected to the reverse control instruction buffer 813 and the reverse data instruction buffer 814.

In fig. 12, by way of example, the controller 85 is connected to a forward control instruction buffer 811 via signal path 1, to a forward data instruction buffer 812 via signal path 2, to a reverse control instruction buffer 813 via signal path 3, and to a reverse data instruction buffer 814 via signal path 4.

As shown in fig. 12, the above-mentioned instruction stream may further include a reverse control instruction stream and a reverse data instruction stream, as described above, the reverse stream is streamed from the MMC device to the MMC host. According to the technical scheme, the lengths of the reverse control instruction buffer 813 and the reverse data instruction buffer 814 are adjusted, so that the reverse control instruction and the reverse data instruction can be respectively delayed, and the sampling edge of the clock signal falls in the middle of the output reverse control instruction and the output reverse data instruction.

In the embodiments shown in fig. 11 and 12, the delay of the clock signal is not required, but the time of outputting the instruction is adjusted only by adjusting the length of the instruction buffer 81, which corresponds to the corresponding delay of the instruction, and the length of the instruction buffer 81 is adjusted by the controller, so that various delay situations can be flexibly dealt with according to actual situations.

According to one embodiment of the present disclosure, in response to detecting that the start bit of the instruction is the start bit of a forward control instruction, wherein the forward control instruction has a first instruction length n1, control of the forward control instruction buffer 811 is enabled to dynamically adjust the first length L1 of the forward control instruction buffer 811 such that the value of L1-n1 enables sampling of the forward control instruction within a sampling window. Specifically, if the command is detected as a forward control command of length n1, the controller 85 opens the signal path "1" to adjust the length L1 of the forward control command buffer 811 so that the forward control command is outputted again through the entire length L1. The length of L1-n1 is set such that the rising and/or falling edges of the clock signal fall exactly in the middle of the forward control instruction. In this case, the forward control instruction buffer 811 operates, and the forward data instruction buffer 812, the reverse control instruction buffer 813, and the reverse data instruction buffer 814 are turned off. Accordingly, signal channels "2", "3" and "4" remain closed.

In response to detecting that the start bit of the instruction is the start bit of a forward data instruction, wherein the forward data instruction has a second instruction length n2, control of the forward data instruction buffer 812 is enabled to dynamically adjust the second length L2 of the forward data instruction buffer 812 such that the value of L2-n2 enables sampling of the forward data instruction within a sampling window. Specifically, if an instruction is detected as a forward data instruction stream of length n2, the controller 85 causes the signal path "2" to open to adjust the length L2 of the forward data instruction buffer 812 such that the forward data instruction is re-output over the entire length L2. The length of L2-n2 is set such that the rising and/or falling edges of the clock signal fall exactly in the middle of the forward data instruction. In this case, the forward data instruction buffer 812 operates, and the forward control instruction buffer 811, the reverse control instruction buffer 813, and the reverse data instruction buffer 814 are turned off. Accordingly, signal channels "1", "3" and "4" remain closed.

In response to detecting that the start bit of the instruction is the start bit of a reverse control instruction, wherein the reverse control instruction has a third instruction length n3, control of the reverse control instruction buffer 813 is enabled to dynamically adjust the third length L3 of the reverse control instruction buffer 813 such that the value of L3-n3 enables sampling of the reverse control instruction within a sampling window. Specifically, if the instruction is detected as a reverse control instruction of length n3, the controller 85 opens the signal path "3" to adjust the length L3 of the reverse control instruction buffer 813 so that the reverse control instruction is outputted again through the entire length L3. The length of L3-n3 is set such that the rising and/or falling edges of the clock signal fall exactly in the middle of the inverted control instruction. In this case, the reverse control instruction buffer 813 operates, and the forward control instruction buffer 811, the forward data instruction buffer 812, and the reverse data instruction buffer 814 are turned off. Accordingly, signal channels "1", "2" and "4" remain closed.

In response to detecting that the start bit of the instruction is the start bit of an inverted data instruction, wherein the inverted data instruction has a fourth instruction length n4, control of the inverted data instruction buffer 814 is enabled to dynamically adjust the fourth length L4 of the inverted data instruction buffer 814 such that the value of L4-n4 enables sampling of the inverted data instruction within a sampling window. Specifically, if an instruction is detected as a reverse data instruction stream of length n4, the controller 85 causes the signal path "4" to be opened to adjust the length L4 of the reverse data instruction buffer 814 such that the reverse data instruction is re-output over the entire length L4. The length of L4-n4 is set such that the rising and/or falling edges of the clock signal fall exactly in the middle of the inverted data instruction. In this case, the reverse data instruction buffer 814 operates, and the forward control instruction buffer 811, the forward data instruction buffer 812, and the reverse control instruction buffer 813 are turned off. Accordingly, signal channels "1", "2" and "3" remain closed.

According to one embodiment of the present disclosure, the controller is further configured to store an extension parameter in advance, the extension parameter being configured to indicate a length L reached by the adjustment of the instruction buffer. According to the above embodiments of the present disclosure, any parameter that causes the length of the instruction buffer to be adjusted to the target length may be stored in the controller. For example, the target parameter L may be stored directly in the controller, and when so stored, the controller then directly adjusts the length of the instruction buffer to L according to the target parameter. Specifically, the length of the forward control instruction buffer 811 may be adjusted to L1, the length of the forward data instruction buffer 812 may be adjusted to L2, the length of the reverse control instruction buffer 813 may be adjusted to L3, and the length of the reverse data instruction buffer 814 may be adjusted to L4.

The controller may also store an increment parameter d for different instruction buffers, i.e. d=l-n, and the initial length of the instruction buffer may be a fixed value n, and the fixed value n may be added with an increment parameter d each time the length of the instruction buffer needs to be adjusted to obtain the target parameter.

According to one embodiment of the present disclosure, the extension parameters of the instruction buffers for the forward data instruction and the reverse data instruction may be stored together, and the extension parameters of the instruction buffers for the forward control instruction and the reverse control instruction may be stored together.

According to one embodiment of the present disclosure, the extension parameter is set such that the sampling edge of the clock signal is located in the middle of the instruction signal.

According to one embodiment of the present disclosure, the sampling edge may be a rising edge and/or a falling edge of the clock signal.

Fig. 13 shows an example of a timing diagram according to one embodiment of the present disclosure.

As shown in fig. 13, when the length of the instruction buffer is not adjusted, T1< T2, i.e., the rising edge of the clock signal is at the position to the left of the control instruction, instead of at the middle of the control instruction; when the DATA instruction (DATA) is sampled, t1> t2, i.e. the falling edge of the clock signal is at a position to the right of the DATA instruction, but not in the middle of the DATA instruction. When the control Command (CMD) is sampled after the length of the corresponding command buffer is extended, t1=t2, i.e. the position of the rising edge of the clock signal in the middle of the control command; while the DATA instruction (DATA) is sampled, t1=t2, i.e. the position of the falling edge of the clock signal in the middle of the DATA instruction.

According to one embodiment of the present disclosure, the period of the step size parameter may be set to T in the controller, and the instruction buffer is extended for a time length of (L-n) ×t. The length L of the instruction buffer is described above, which is described from the point of view of the instruction buffer size. The size of the instruction buffer may be converted into a time parameter. The step size parameter for a period T may be set by a Phase Locked Loop (PLL), each period T corresponding to a minimum unit length, then the total time spent is (L-n) T if the instruction buffer is to be extended by a length L-n. This may control the extension time by the number of beats of the clock signal.

The present disclosure also provides an electronic device comprising a system as described above.

Fig. 14 illustrates a method for sampling instructions according to one aspect of the present disclosure, including: in operation S1410, a length L of the instruction buffer is dynamically adjusted, wherein the instruction buffer is a FIFO buffer and is used for receiving instructions; and adjusting the relative position of the instruction and the clock signal exiting the FIFO buffer by adjusting the value of L-n such that the samples of the instruction are within a sampling window, wherein the length L of the FIFO buffer is greater than the length n of the instruction, in operation S1420.

The above method may be performed in a controller (as shown in fig. 11 and 12) that may be connected to different instruction buffers (e.g., a forward control instruction buffer 811, a forward data instruction buffer 812, a reverse control instruction buffer 813, and a reverse data instruction buffer 814 as shown in fig. 12) through signal paths. The controller selectively turns on the signal channels to adjust the length of the corresponding instruction buffer.

According to one embodiment of the present disclosure, further comprising: the type of the instruction is detected to dynamically adjust the length L of the instruction buffer according to the type of the instruction.

As described above, the types of instructions may include forward control instructions, forward data instructions, reverse control instructions, and reverse data instructions. The disclosure is also applicable to any other type of instruction and is not limited by the examples of the disclosure.

According to one embodiment of the present disclosure, detecting the type of the instruction includes detecting a start bit of the instruction as a forward control instruction, a reverse control instruction, a forward data instruction, or a reverse data instruction.

As can be seen from the above tables 1,2 and 10, each of the forward control instruction, the reverse control instruction, the forward data instruction and the reverse data instruction has a start bit and an end bit, and by detecting the start bit of each instruction, the type of the instruction, i.e., whether the forward control instruction, the reverse control instruction, the forward data instruction or the reverse data instruction, can be known.

According to one embodiment of the present disclosure, further comprising pre-storing an extension parameter indicating a length L reached by the adjustment of the instruction buffer.

As described above, the extension parameter may be any parameter that enables the instruction buffer to reach the target length L, and not necessarily the target length itself.

According to one embodiment of the present disclosure, the sampling edge is a rising edge and/or a falling edge of the clock signal.

According to one embodiment of the present disclosure, the period T of the step size parameter is set, and the instruction buffer is extended for a time length of (L-n) ×t.

The above method steps and operations have been described in detail in connection with the system above, and will not be repeated here.

According to one aspect of the present disclosure, there is also provided a controller including: one or more processors; and a memory having stored therein computer executable instructions that, when executed by the one or more processors, cause the controller to perform the method as described above.

According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a PC device, an internet of things terminal, a mobile terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus. The electronic device or apparatus of the present disclosure may also be applied to the internet, the internet of things, data centers, energy sources, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, medical, and the like. Further, the electronic device or apparatus of the present disclosure may also be used in cloud, edge, terminal, etc. application scenarios related to artificial intelligence, big data, and/or cloud computing. In one or more embodiments, a computationally intensive electronic device or apparatus according to aspects of the present disclosure may be applied to a cloud device (e.g., a cloud server), while a less power consuming electronic device or apparatus may be applied to a terminal device and/or an edge device (e.g., a smart phone or camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device according to the hardware information of the terminal device and/or the edge device to simulate the hardware resources of the terminal device and/or the edge device, so as to complete unified management, scheduling and collaborative work of an end cloud entity or an edge cloud entity.

The foregoing has outlined rather closely the embodiments of the present disclosure, and detailed description of the principles and embodiments of the present disclosure have been presented herein with the application of specific examples, the description of the examples above being merely intended to facilitate an understanding of the method of the present disclosure and its core ideas. Also, those skilled in the art, based on the teachings of the present disclosure, may make modifications or variations in the specific embodiments and application scope of the present disclosure, all falling within the scope of the protection of the present disclosure. In view of the foregoing, this description should not be construed as limiting the disclosure.

The technical solution of the present disclosure can be better understood by the following clauses.

Clause 1. A system for sampling instructions, comprising: an instruction buffer and a driver, wherein,

The instruction buffer is a first-in first-out FIFO buffer and is connected with the driver;

The FIFO buffer is configured to receive instructions to or from the driver;

the length L of the FIFO buffer is greater than the length n of the instruction and the value of L-n is set to adjust the relative position of the instruction and clock signal leaving the FIFO buffer so that the samples of the instruction are within a sampling window.

Clause 2 the system of clause 1, further comprising a controller coupled to the instruction buffer to dynamically adjust the length L of the instruction buffer.

Clause 3 the system of clause 2, wherein the controller is configured to: the type of the instruction is detected to dynamically adjust the length L of the instruction buffer according to the type of the instruction.

Clause 4 the system of clause 2 or 3, wherein the instruction buffer comprises: a forward control instruction buffer and a forward data instruction buffer; the driver comprises a first driver and a second driver; wherein,

The forward control instruction buffer is used for receiving and buffering a forward control instruction stream flowing to the first driver;

The forward data instruction buffer is used for receiving and buffering a forward data instruction stream flowing to the second driver;

the controller is connected with the forward control instruction buffer and the forward data instruction buffer.

Clause 5 the system of clause 4, wherein the instruction buffer further comprises a reverse control instruction buffer and a reverse data instruction buffer; the driver further comprises a third driver and a fourth driver, wherein,

The reverse control instruction buffer is used for receiving and buffering a reverse control instruction stream from the third driver;

The reverse data instruction buffer is used for receiving and buffering a reverse data instruction stream from the fourth driver;

The controller is connected with the reverse control instruction buffer and the reverse data instruction buffer.

Clause 6 the system of clause 5, wherein detecting the type of the instruction comprises detecting the start bit of the instruction as a forward control instruction, a reverse control instruction, a forward data instruction, or a reverse data instruction.

Clause 7. The system of clause 6, wherein,

In response to detecting that the start bit of the instruction is the start bit of a forward control instruction, wherein the forward control instruction has a first instruction length n1, enabling control of the forward control instruction buffer to dynamically adjust a first length L1 of the forward control instruction buffer such that a value of L1-n1 enables sampling of the forward control instruction within a sampling window;

in response to detecting that the start bit of the instruction is a start bit of a forward data instruction, wherein the forward data instruction has a second instruction length n2, enabling control of the forward data instruction buffer to dynamically adjust a second length L2 of the forward data instruction buffer such that a value of L2-n2 enables sampling of the forward data instruction within a sampling window;

In response to detecting that the start bit of the instruction is the start bit of a reverse control instruction, wherein the reverse control instruction has a third instruction length n3, enabling control of the reverse control instruction buffer to dynamically adjust a third length L3 of the reverse control instruction buffer such that the value of L3-n3 enables sampling of the reverse control instruction within a sampling window;

In response to detecting that the start bit of the instruction is a start bit of a reverse data instruction, wherein the reverse data instruction has a fourth instruction length n4, control of the reverse data instruction buffer is enabled to dynamically adjust a fourth length L4 of the reverse data instruction buffer such that the value of L4-n4 enables sampling of the reverse data instruction within a sampling window.

Clause 8 the system of any of clauses 2-7, wherein the controller is further configured to pre-store an extension parameter indicating a length L reached by the adjustment of the instruction buffer.

Clause 9 the system of clause 8, wherein the extension parameter is set such that the sampling edge of the clock signal is centered in the instruction signal.

Clause 10. The system of clause 9, wherein the sampling edge is a rising edge and/or a falling edge of the clock signal.

Clause 11. The system of any of clauses 2-9, wherein the period in which the step size parameter is set in the controller is T1, and the instruction buffer is extended for a length of time (L-n) x T1.

Clause 12. An electronic device comprising the system of any of clauses 1-11.

Clause 13. A method for sampling instructions, comprising:

dynamically adjusting the length L of the instruction buffer, wherein the instruction buffer is a first-in first-out FIFO buffer and is used for receiving instructions;

The relative position of the instruction and the clock signal leaving the FIFO buffer is adjusted by adjusting the value of L-n such that the samples of the instruction are within a sampling window, wherein the length L of the FIFO buffer is greater than the length n of the instruction.

Clause 14 the method of clause 13, further comprising: the type of the instruction is detected to dynamically adjust the length L of the instruction buffer according to the type of the instruction.

Clause 15 the method of clause 14, wherein detecting the type of the instruction comprises detecting the start bit of the instruction as a forward control instruction, a reverse control instruction, a forward data instruction, or a reverse data instruction.

Clause 16 the method of clause 14 or 15, further comprising pre-storing an extension parameter indicating the length L reached by the adjustment of the instruction buffer.

Clause 17 the method of clause 16, wherein the extension parameter is set such that the sampling edge of the clock signal is centered in the instruction signal.

Clause 18. The method of clause 17, wherein the sampling edge is a rising edge and/or a falling edge of the clock signal.

Clause 19. The method of any of clauses 13-18, wherein the period T1 with the step size parameter set is extended for a length of time (L-n) x T1.

Clause 20, a controller, comprising:

One or more processors; and

A memory having stored therein computer executable instructions that, when executed by the one or more processors, cause the controller to perform the method of any of clauses 13-19.

Claims

1. A system for sampling instructions, comprising: an instruction buffer and a driver, wherein,

The FIFO buffer is configured to receive instructions to or from the driver;

2. The system of claim 1, further comprising a controller coupled to the instruction cache to dynamically adjust the length L of the instruction cache.

3. The system of claim 2, wherein the controller is configured to: the type of the instruction is detected to dynamically adjust the length L of the instruction buffer according to the type of the instruction.

4. A system according to claim 2 or 3, wherein the instruction buffer comprises: a forward control instruction buffer and a forward data instruction buffer; the driver comprises a first driver and a second driver; wherein,

5. The system of claim 4, wherein the instruction buffer further comprises a reverse control instruction buffer and a reverse data instruction buffer; the driver further comprises a third driver and a fourth driver, wherein,

6. The system of claim 5, wherein detecting the type of instruction comprises detecting whether a start bit of the instruction is a forward control instruction, a reverse control instruction, a forward data instruction, or a reverse data instruction.

7. The system of claim 6, wherein,

8. The system of any of claims 2-7, the controller further to pre-store an extension parameter indicating a length L reached by the instruction buffer adjustment.

9. The system of claim 8, wherein the extension parameter is set such that a sampling edge of the clock signal is centered in the instruction signal.

10. The system of claim 9, the sampling edge being a rising edge and/or a falling edge of the clock signal.

11. The system of any of claims 2-9, wherein the controller has a period of time T1 in which a step size parameter is set, and the instruction buffer is extended for a length of time (L-n) T1.

12. An electronic device comprising the system of any one of claims 1-11.

13. A method for sampling instructions, comprising:

14. The method of claim 13, further comprising: the type of the instruction is detected to dynamically adjust the length L of the instruction buffer according to the type of the instruction.

15. The method of claim 14, wherein detecting the type of instruction comprises detecting whether a start bit of the instruction is a forward control instruction, a reverse control instruction, a forward data instruction, or a reverse data instruction.

16. The method of claim 14 or 15, further comprising pre-storing an extension parameter indicating a length L reached by the adjustment of the instruction buffer.

17. The method of claim 16, wherein the extension parameter is set such that a sampling edge of the clock signal is located in the middle of an instruction signal.

18. The method of claim 17, the sampling edge being a rising edge and/or a falling edge of the clock signal.

19. The method of any of claims 13-18, wherein a period T1 of a step size parameter is set, the instruction buffer being extended for a length of time (L-n) x T1.

20. A controller, comprising:

One or more processors; and

A memory having stored therein computer executable instructions that, when executed by the one or more processors, cause the controller to perform the method of any of claims 13-19.