WO2024092437A1 - 一种数据传输方法、装置和系统 - Google Patents

一种数据传输方法、装置和系统 Download PDF

Info

Publication number
WO2024092437A1
WO2024092437A1 PCT/CN2022/128727 CN2022128727W WO2024092437A1 WO 2024092437 A1 WO2024092437 A1 WO 2024092437A1 CN 2022128727 W CN2022128727 W CN 2022128727W WO 2024092437 A1 WO2024092437 A1 WO 2024092437A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
groups
ecc
streams
coded
Prior art date
Application number
PCT/CN2022/128727
Other languages
English (en)
French (fr)
Inventor
罗亦林
尹海丰
刘泽伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/128727 priority Critical patent/WO2024092437A1/zh
Publication of WO2024092437A1 publication Critical patent/WO2024092437A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing

Definitions

  • the present application relates to the field of electronic technology, and in particular to a data transmission method, device and system.
  • FIG. 1 is a schematic diagram of channel transmission with three-level pulse amplitude modulation (Pulse Amplitude Modulation-3, PAM-3) encoding and channel transmission with non-return-to-zero (NRZ) encoding.
  • PAM-3 pulse Amplitude Modulation-3
  • NRZ non-return-to-zero
  • NRZ encoding can use 3 unit intervals (Unit Interval, UI) to transmit 3 bits of data in the channel, and PAM-3 encoding can use 2 UIs to transmit 3 bits of data in the channel. PAM-3 encoding can increase the interconnection bandwidth by 1.5 times compared to NRZ encoding.
  • UI Unit Interval
  • the receiving end needs to correct errors through error correction technology to reduce the bit error rate of the interconnection interface.
  • the existing technology generally uses multi-bit error correction technology or retransmission technology for error correction, and the error correction process has a large delay, which affects the data transmission efficiency.
  • the present application provides a data transmission method, device and system for reducing the error correction processing delay in a chip interconnection system and improving data transmission efficiency.
  • a data transmission device which may be a transmitting end or the device is located at the transmitting end, and the device includes: an ECC encoder, which is used to perform single-bit error correction ECC encoding on N groups of data bit streams to obtain N groups of ECC encoded bit streams; wherein each group of data bit streams in the N groups of data bit streams includes M data bits, and the encoded bits in each group of ECC encoded bit streams in the N groups of ECC encoded bit streams include M data bits and P check bits; an interleaver, which is used to read the encoded bits from the N groups of ECC encoded bit streams in a polling manner and reorganize them into a first encoded bit stream; wherein each N encoded bits in the first encoded bit stream is a unit, and the N encoded bits in each unit are respectively from different ECC encoded bit streams; an IO data distributor, which is used to distribute each unit in the first encoded bit stream to Q IO pins to obtain Q groups of IO data streams
  • the transmitting end performs single-bit ECC encoding on N groups of data bit streams, and then interleaves the encoded N groups of ECC encoded bit streams to ensure that the N bits in a unit in the interleaved bit stream correspond to different ECCs, and PAM encoding is performed with the unit as the granularity, so that when a bit error occurs in a unit, the error will be dispersed to different ECCs, so that the receiving end can complete multi-bit error correction processing through a single-bit error correction method.
  • the transmitting end can achieve lower processing delay and smaller area power consumption.
  • the device may further include a data grouper, which is used to group the original data bit streams to obtain N groups of data bit streams; and transmit the N groups of data bit streams to the ECC encoder respectively.
  • a data grouper which is used to group the original data bit streams to obtain N groups of data bit streams; and transmit the N groups of data bit streams to the ECC encoder respectively.
  • the original data bit streams can be grouped into N groups of data bit streams, and the N groups of data bit streams can be encoded with different ECCs, thereby improving the reliability of the solution.
  • the data lengths of the IO data streams in the Q groups of IO data streams are the same.
  • the IO data distributor evenly distributes the units in the first coded bit stream to the Q IO pins.
  • the single-bit error correction ECC code includes an extended Hamming code for implementing SEC-DED, such as an optimal minimum odd-weight column code.
  • an extended Hamming code for implementing SEC-DED, such as an optimal minimum odd-weight column code.
  • this is only an example, and there may be other extended Hamming codes for implementing SEC-DED.
  • the value of N is any one of 3, 4, 6, 8 or 16.
  • single-bit ECC encoding can be implemented in PAM scenarios such as PAM-3, PAM-4, PAM-8, and PAM-16.
  • a data transmission device which may be a receiving end or the device is located at the receiving end, and the device includes: Q IO pins, respectively used to receive Q groups of PAM coded bit streams; a PAM decoder, used to perform PAM-N decoding on the Q groups of PAM coded bit streams to obtain Q groups of IO data streams; a first aggregator, used to aggregator the Q groups of IO data streams to obtain a first coded bit stream, wherein each N coded bits in the first coded bit stream is a unit; a deinterleaver, used to read coded bits from each unit of the first coded bit stream, and reorganize them into N groups of ECC coded bit streams; wherein the N coded bits in each unit are respectively allocated to different ECC coded bit streams, and the coded bits in each group of the N groups of ECC coded bit streams include M data bits and P check bits; an ECC decoder, used to perform ECC decoding for single-bit error correction
  • the receiving end performs PAM-N decoding on the Q-group PAM coded bit stream, and converges and interleaves the decoded IO data stream, so that different coded bits in the same unit are scattered on different ECCs.
  • the receiving end can use a single-bit error correction method to complete multi-bit error correction processing.
  • the receiving end can achieve lower processing delay and smaller area power consumption.
  • the device may further include: a second aggregator, configured to aggregate N groups of data bit streams into an original data bit stream.
  • the original data bit stream can be restored, thereby improving the reliability of the solution.
  • the data lengths of the IO data flows in the Q group of IO data flows are the same.
  • the single-bit error correction ECC encoding includes an extended Hamming code for implementing SEC-DED, such as an optimal minimum odd-weight column code.
  • the value of N is any one of 3, 4, 6, 8 or 16.
  • a data transmission method comprising: performing single-bit error correction ECC encoding on N groups of data bit streams to obtain N groups of ECC encoded bit streams; wherein each group of data bit streams in the N groups of data bit streams includes M data bits, and the encoded bits in each group of ECC encoded bit streams in the N groups of ECC encoded bit streams include M data bits and P check bits; reading the encoded bits from the N groups of ECC encoded bit streams in a polling manner and reorganizing them into a first encoded bit stream; wherein each N encoded bits in the first encoded bit stream is a unit, and the N encoded bits in each unit come from different ECC encoded bit streams; distributing each unit in the first encoded bit stream to Q IO pins to obtain Q groups of IO data streams; performing PAM-N encoding on the Q groups of IO data streams to obtain Q groups of PAM encoded bit streams; and sending the Q groups of PAM encoded bit streams through the Q IO pins; where
  • the method further includes: grouping the original data bit streams to obtain N groups of data bit streams.
  • the data lengths of the IO data flows in the Q group of IO data flows are the same.
  • the single-bit error correction ECC encoding includes an extended Hamming code for implementing SEC-DED, such as an optimal minimum odd-weight column code.
  • the value of N is any one of 3, 4, 6, 8 or 16.
  • a data transmission method comprising: receiving Q groups of PAM coded bit streams through Q IO pins; performing PAM-N decoding on the Q groups of PAM coded bit streams to obtain Q groups of IO data streams; aggregating the Q groups of IO data streams to obtain a first coded bit stream, wherein each N coded bits in the first coded bit stream is a unit; reading coded bits from each unit of the first coded bit stream, and reorganizing them into N groups of ECC coded bit streams; wherein the N coded bits in each unit are respectively allocated to different ECC coded bit streams, and the coded bits in each group of the N groups of ECC coded bit streams include M data bits and P check bits; performing ECC decoding with single-bit error correction on the N groups of ECC coded bit streams to obtain N groups of data bit streams; wherein N, M, P, and Q are positive integers.
  • the method further includes: aggregating N groups of data bit streams into an original data bit stream.
  • the data lengths of the IO data flows in the Q group of IO data flows are the same.
  • the single-bit error correction ECC encoding includes an extended Hamming code for implementing SEC-DED, such as an optimal minimum odd-weight column code.
  • the value of N is any one of 3, 4, 6, 8 or 16.
  • a computer-readable storage medium is provided, wherein the computer-readable storage medium is used to store instructions.
  • the instructions are executed, the method described in the third aspect or any possible implementation of the third aspect is implemented, or the method described in the fourth aspect or any possible implementation of the fourth aspect is implemented.
  • a computer program product wherein instructions are stored in the computer program product, which, when executed on a computer, causes the method described in the third aspect or any possible implementation of the third aspect to be executed, or causes the method described in the fourth aspect or any possible implementation of the fourth aspect to be executed.
  • a data transmission system comprising an apparatus as described in the first aspect or any possible implementation of the first aspect and an apparatus as described in the second aspect or any possible implementation of the second aspect.
  • FIG1 is a schematic diagram of PAM-3 coded channel transmission and NRZ coded channel transmission
  • FIG2 is a schematic diagram of NRZ encoding
  • FIG3A is a schematic diagram of a PAM-3 encoding
  • FIG3B is a schematic diagram showing a single UI error causing multiple bit errors after PAM-3 decoding
  • FIG4 is a schematic diagram of an application scenario provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a data transmission system provided in an embodiment of the present application.
  • FIG6A is a schematic diagram of a data processing flow in the data transmission device 01;
  • FIG6B is a schematic diagram of a data processing flow in the data transmission device 02;
  • FIG7 is a schematic diagram of a data transmission device 01 provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of data interleaving provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of IO data distribution provided in an embodiment of the present application.
  • FIG10 is a schematic diagram of another data transmission device 01 provided in an embodiment of the present application.
  • FIG11 is a schematic diagram of another data transmission device 01 provided in an embodiment of the present application.
  • FIG12 is a schematic diagram of a specific data processing example
  • FIG13 is a schematic diagram of a data transmission device 02 provided in an embodiment of the present application.
  • FIG14 is a schematic diagram of IO data aggregation provided by an embodiment of the present application.
  • FIG15 is a schematic diagram of data deinterleaving provided in an embodiment of the present application.
  • FIG16 is a schematic diagram of another data transmission device 02 provided in an embodiment of the present application.
  • FIG17 is a schematic diagram of another data transmission device 02 provided in an embodiment of the present application.
  • FIG18 is a schematic diagram of a specific data processing example
  • FIG19 is a schematic diagram of a specific single-bit error correction example.
  • Pulse Amplitude Modulation It is a modulation method in which the amplitude of the pulse carrier changes with the baseband signal. According to the modulation level, PAM is divided into many types. For example, for the convenience of description, in this article, PAM-N can be used to represent N-level pulse amplitude modulation, where N is a positive integer. Level N can represent how many level values there are, for example, in PAM-3, there are three level values: high, medium, and low.
  • Non-Return-to-Zero (NRZ) coding also known as two-level pulse amplitude modulation (Pulse Amplitude Modulation-2, PAM-2) coding, which uses two voltage levels to represent logic 0 and logic 1, such as a positive level representing 1 and a low level representing 0. The difference between it and the return-to-zero (RZ) coding is that it does not need to return to zero, that is, one cycle can be used to transmit data, so that the transmission bandwidth can be fully utilized.
  • Traditional digital signals mostly use NRZ signals, that is, two signal levels are used to represent the 1 and 0 information of digital logic signals, and each symbol period (or UI) can transmit 1 bit of logic information.
  • Figure 2 is a schematic diagram of NRZ coding.
  • Three-level pulse amplitude modulation (PAM-3) coding 2 ternary level values are used to represent 3 bits of logic information, that is, 3 bits of logic information can be transmitted every two symbol periods.
  • Figure 3A is a schematic diagram of PAM-3 coding. There are 8 combinations of 3 bits (i.e., 2 to the cube), and 9 combinations of two level values (i.e., 3 to the square), so the latter can cover the former. Therefore, two levels can represent 3 bits of logic information, and each level actually contains 1.5 bits of information.
  • PAM-3 encoding there are also multi-level PAM encoding methods such as PAM-4 encoding, PAM-6 encoding, and PAM-8 encoding, which will not be introduced one by one here.
  • Chiplets also known as core particles or small chips, refer to pre-manufactured chips (Die) with specific functions that can be combined and integrated.
  • Die pre-manufactured chips
  • a type of bare chip (die) that meets specific functions can be packaged together with multiple module chips and underlying basic chips through die-to-die internal interconnection technology to form a system chip.
  • Chips can be interconnected through multiple channels (the specific implementation of the channel can be input and output (IO or I/O) pins) and transmit data, achieving low cost and high yield while improving performance.
  • system and “network” in the embodiments of the present application can be used interchangeably.
  • “At least one” means one or more, and “plurality” means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
  • At least one of the following” or similar expressions refers to any combination of these items, including any combination of single or plural items.
  • at least one of a, b or c can mean: a, or b, or c, or a and b, or b and c, or a and c, or a, b and c.
  • the ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, timing, priority or importance of multiple objects.
  • the first priority criterion and the second priority criterion are only used to distinguish different criteria, and do not indicate the difference in the content, priority or importance of the two criteria.
  • Figure 4 is a schematic diagram of an application scenario provided in an embodiment of the present application.
  • This scenario illustrates a core particle interconnection system, including at least two core particles.
  • Figure 4 illustrates two core particles, namely core particle A and core particle B, but is not limited to this.
  • Core particles can be interconnected and transmit data through one or more channels.
  • Figure 4 takes the example of core particle A sending data to core particle B, that is, core particle A is the sending end and core particle B is the receiving end.
  • the interconnection channel between chiplets is based on the channel within the package, which can achieve a lower bit error rate than the inter-chip interconnection channel.
  • the industry generally defines the bit error probability (Bit Error Ratio, BER) before error correction as less than 1e-15.
  • BER Bit Error Ratio
  • the transmission delay in the chiplet interconnection system affects the overall performance of the system. When correcting the bit errors, the impact on the transmission delay should be reduced, and the bit error rate of the interconnection transmission should be improved with the smallest possible implementation cost.
  • multi-level coding can be used to encode data to increase the channel interconnection bandwidth.
  • multiple bit errors may be generated after decoding. For example, for PAM-3 coding, 3 bits of data are transmitted every 2 UI in the channel. If an original bit error of 1UI occurs in the channel, 2 to 3 bits of bit errors may be generated after decoding.
  • FIG. 3B is a schematic diagram of a single UI error causing multiple bit errors after PAM-3 decoding.
  • the receiving end samples the level of each UI, 00 represents a low level (-1), 01 represents an intermediate level (0), and 11 represents a high level (1).
  • the levels of two consecutive UIs can restore three data bits. For example, the original data sent by the transmitting end is 010, and the correct levels of the two UIs should be 00 (low level) 01 (intermediate level). Assume that the level sampling error of the first UI becomes the intermediate level 01, that is, the intermediate level of the two UIs is received. At this time, it coincides with the code pattern of sending "101". PAM-3 will decode it as "101", which has 3 bits of error compared to the correct "010" data.
  • the receiving end needs to perform error correction to reduce the bit error rate of the interconnection interface.
  • Reed-Solomon (RS) (450, 406) codes can be used to detect and correct errors in the chip interconnection system.
  • RS coding technology 9 bits are used as a signal (symbol), and 44 symbol check bits are added to the original data of 406 symbols, supporting error correction of up to 22 symbol errors.
  • OFA Operations Administration and Maintenance
  • 45 81-bit data blocks and 1 9-bit Operations Administration and Maintenance (OAM) information form 406 symbols.
  • OFAM Operations Administration and Maintenance
  • 44 symbols are added, and a total of 450 symbol data is generated.
  • the error-corrected original data of 406 symbols is output, and 45 81-bit data blocks and 1 9-bit OAM are restored.
  • the RS encoding and decoding process is complex, resulting in a large processing delay (on the order of tens of ns), which cannot meet the low-latency requirements of the chiplet interface interconnection.
  • BCH codes can be used to detect and correct errors in the data in the chiplet interconnection system.
  • BCH coding BCH coding is often used to correct multiple random errors and can be designed according to the number of bits to be corrected and the length of the information bits.
  • n the codeword length
  • t the number of error correction bits
  • k the number of information bits
  • n 2 ⁇ m-1
  • BCH 255, 239, 2
  • m 8
  • data retransmission can be used to correct the errors.
  • a sequence number and a cyclic redundancy check (CRC) are added to the data, and the sent data is stored in the sending buffer; after receiving the data, the receiving end verifies the data bits through CRC, and when the data is correct, it sends an acknowledgment (ACK) response to the sending end, and the sending end releases the data in the sending buffer; when the data is erroneous, the receiving end sends a negative acknowledgement (NACK) response to the sending end with the corresponding sequence number, and carries the corresponding sequence number information, and the sending end resends the message corresponding to the sequence number.
  • CRC cyclic redundancy check
  • each sent message needs to be cached, the receiving end needs to return ACK/NAK information in real time, a bidirectional link must be used, and the transmission bandwidth of the reverse path is occupied, the implementation cost and power consumption are high; and when data errors occur, the sender needs to retransmit to correct the errors, and the processing delay is still large.
  • a technical solution of an embodiment of the present application is provided for implementing error correction using single-bit error correction technology in a chip interconnection system, which has lower latency and smaller area power consumption than multi-bit error correction technology or retransmission technology.
  • FIG. 5 is a schematic diagram of a data transmission system provided in an embodiment of the present application, the system includes a data transmission device 01 and a data transmission device 02 .
  • the data transmission device 01 is a transmitting end or a device located at the transmitting end, such as the core A in the scenario shown in Figure 4.
  • the data transmission device 01 includes an error correction code (ECC) encoder 11, an interleaver 12, an IO data distributor 13 and a PAM encoder 14.
  • ECC error correction code
  • the data transmission device 02 is a receiving end or a device located at the receiving end, for example, the core B in the scenario shown in FIG4 .
  • the data transmission device 02 includes a PAM decoder 21 , a first converger 22 , a deinterleaver 23 , and an ECC decoder 24 .
  • the data transmission system also includes multiple IO pins, each IO pin includes multiple parts, one part is distributed in the data transmission device 01, one part is distributed in the data transmission device 02, and another part is connected between the data transmission device 01 and the data transmission device 02, serving as an interconnection channel between the data transmission device 01 and the data transmission device 02, and is used to transmit data sent by the data transmission device 01 to the data transmission device 02.
  • the data transmission device 01 is used to perform single-bit ECC encoding, interleaving and PAM-N processing on the original data, and then transmit the processed data stream through the channel to the data transmission device 02.
  • the data transmission device is used to perform PAM-N decoding, deinterleaving and single-bit ECC decoding on the received data to restore the original data.
  • the data processing scheme in the data transmission device 01 is introduced below.
  • FIG. 6A is a schematic diagram of the data processing flow in the data transmission device 01 .
  • the ECC encoder 11 performs single-bit error correction ECC encoding on N groups of data bit streams to obtain N groups of ECC encoded bit streams.
  • N is a positive integer greater than 2, for example, the value of N is any one of 3, 4, 6, 8 or 16.
  • Each of the N groups of data bit streams includes M data bits
  • the coded bits in each of the N groups of ECC coded bit streams include M data bits and P check bits.
  • the number of groups of the data bit stream in step S601A corresponds to the encoding method of the PAM encoder 14 in step S604A.
  • N corresponds to the encoding method of the PAM encoder 14 in step S604A.
  • the single-bit error correction ECC encoding includes an extended Hamming code for implementing single-error correction and double-error detecting (SEC-DED).
  • SEC-DED single-error correction and double-error detecting
  • the optimal minimum odd-weighted column code, or other extended Hamming codes for implementing SEC-DED are not limited in this application.
  • the encoding method is PAM-3 encoding, and each group of data bit streams includes 120 data bits. Then, 8 bits of check bits can be provided for each group of data bit streams, and each group of ECC encoded bit streams outputs 128 bits.
  • ECC coded bit streams correspond to different error correction codes, namely ECC (specifically, for example, an optimal minimum odd-weighted column code).
  • the number of ECC encoders 11 is N, as shown in FIG7 , and the N ECC encoders 11 are respectively used to perform single-bit error correction ECC encoding on N groups of data bit streams, and different ECC encoders 11 use different ECCs to implement different ECC encoding bit streams corresponding to different ECCs.
  • ECC encoder 11-1 is used to perform single-bit error correction ECC encoding on the first group of data bit streams to obtain the first group of ECC encoding bit streams
  • ECC encoder 11-2 is used to perform single-bit error correction ECC encoding on the second group of data bit streams to obtain the second group of ECC encoding bit streams
  • ECC encoder 11-N is used to perform single-bit error correction ECC encoding on the Nth group of data bit streams to obtain the Nth group of ECC encoding bit streams.
  • the interleaver 12 reads coded bits from N groups of ECC coded bit streams in a polling manner and reassembles them into a first coded bit stream.
  • the interleaver 12 reads the coded bits from the N groups of ECC coded bit streams by polling, which means that multiple read operations are performed on the N groups of ECC coded bit streams, wherein the process in which the interleaver 12 performs a read operation on each group of ECC coded bit streams in the N groups of ECC coded bit streams is one round.
  • the interleaver 12 reads 1 bit of data from each group of ECC coded bit streams in the N groups of ECC coded bit streams in each round, that is, each round reads a total of N bits, and the N bits come from the N groups of ECC coded bit streams; when the interleaver 12 completes one round of reading operations, it performs the next round of reading operations until all the coded bits in the N groups of ECC coded bit streams are read.
  • the order in which the interleaver 12 reads the coded bits from the N groups of ECC coded bit streams in each round is the same, and this order can be called the first polling order.
  • the first polling order is the first group of ECC coded bit streams, the second group of ECC coded bit streams, and the third group of ECC coded bit streams.
  • this is only an example and not a limitation.
  • the first coded bit stream can be divided into units according to the bit order with each N coded bits, and the N coded bits in each unit come from different ECC coded bit streams.
  • the 1st to 3rd bits are a unit (the 1st to 3rd bits come from 3 groups of ECC encoded bit streams respectively), the 4th to 6th bits are a unit (the 4th to 6th bits come from 3 groups of ECC encoded bit streams respectively), and so on, and so on.
  • the relative positions of the N coded bits read by the interleaver 12 in each round in the ECC coded bits to which they belong are the same.
  • the first group of ECC encoded bit streams is A_0/A_1/A_2...
  • the second group of ECC encoded bit streams is B_0/B_1/B_2...
  • the third group of ECC encoded bit streams is C_0/C_1/C_2.
  • the interleaver 12 sequentially reads the first bit (i.e., A_0) in the first group of ECC coded bit streams, the first bit (i.e., B_0) in the second group of ECC coded bit streams, and the first bit (i.e., C_0) in the third group of ECC coded bit streams; in the second round, the interleaver 12 sequentially reads the second bit (i.e., A_1) in the first group of ECC coded bit streams, the second bit (i.e., B_1) in the second group of ECC coded bit streams, the second bit (i.e., C_1) in the third group of ECC coded bit streams, ..., and repeats this cycle to finally obtain the first coded bit stream A_0/B_0/C_0/A_1/B_1/C_1...A_N/B_N/C_N.
  • the process of the interleaver reorganizing the coded bits read from N groups of ECC coded bit streams into the first coded bit stream is essentially a process of rearranging the coded bits, so this process can also be described as "interleaving".
  • the IO data distributor 13 distributes each unit in the first coded bit stream to Q IO pins to obtain Q groups of IO data streams.
  • the IO data distributor 13 distributes the bits in the first coded bit stream based on the unit granularity, so that it can be ensured that the N bits distributed to the Q IO pins (i.e., the N bits in each unit) come from different ECC coded bit data streams and correspond to different ECCs.
  • FIG. 9 is a schematic diagram of distributing each unit in the first coded bit stream to Q IO pins.
  • the IO data distributor 13 evenly distributes each unit in the first coded bit stream to Q IO pins.
  • the data lengths of each IO data stream in the Q groups of IO data streams are the same. In this way, the parallelism of data processing can be improved, and the data processing efficiency can be further improved.
  • the PAM encoder 14 performs PAM-N encoding on the Q groups of IO data streams to obtain Q groups of PAM encoded bit streams.
  • the PAM-3 encoding method is adopted, and the 3 bits of each unit can be encoded by using 2 ternary level values.
  • the specific encoding principle can refer to the relevant content shown in the previous Figure 3A, which will not be repeated here.
  • each IO pin corresponds to a PAM encoder 14 , that is, the number of IO pins is Q, the number of PAM encoders 14 is also Q, and each PAM encoder is used to perform PAM-N encoding on the IO data stream on the IO pin corresponding to the PAM encoder.
  • Q groups of PAM encoded bit streams are sent out through Q IO pins respectively.
  • N, M, P, and Q are all positive integers.
  • the data transmission device 01 further includes a data grouper 15 for grouping the original data bit stream to obtain N groups of data bit streams; and transmitting the N groups of data bit streams to the ECC encoder 11 .
  • data transmission device 01 and data transmission device 02 are interconnected through 64 IO pins (i.e., IO 0 to IO 63).
  • Data transmission device 01 needs to perform grouping, ECC encoding, interleaving, IO data distribution, and PAM-3 encoding on the original data before sending the data.
  • the specific processing flow is as follows:
  • the data grouper 15 divides the 360-bit original data (represented as D[120*3-1:0] in Figure 12, referring to 360 bits of data from 0 to 120*3-1) into 3 groups on average, each with 120 bits, namely DA[119:0], DB[119:0], and DC[119:0].
  • Each grouped data corresponds to an ECC encoder 11, corresponding to ECC encoder A, ECC encoder B, and ECC encoder C, respectively.
  • Each ECC encoder 11 uses the minimum odd weight column code as the ECC encoding algorithm, adds 8-bit check bits (C[7:0]) to each group of 120-bit input data (D[119:0]), and outputs a total of 128 bits of data.
  • the interleaver 12 reorganizes the three groups of 128-bit data ( ⁇ CC[7:0],DC[119:0] ⁇ , ⁇ CB[7:0],DB[119:0] ⁇ , ⁇ CA[7:0],DA[119:0] ⁇ ) output by the three ECC encoders 11 into a group of parallel data, i.e., the first coded bit stream.
  • the specific interleaving method is to poll the three groups of data, take 1 bit from the group each time, and so on.
  • the final output data is: CC[7], CB[7], CA[7], ..., CC[0], CB[0], CA[0], DC[119], DB[119], DA[119], ..., DC[1], DB[1], DA[1], DC[0], DB[0], DA[0].
  • the IO data distributor 13 distributes the interleaved data in units of 3 consecutive bits to the 64 IO pins in bit order, and each IO pin transmits 6 bits of IO data.
  • the PAM-3 encoder 14 performs PAM-3 encoding on the data in each IO pin, so that three data bits are transmitted using two UIs, thereby obtaining 64 groups of PAM encoded bit streams, each of which includes four bits.
  • the data transmission device 01 may also include other components, and the data transmission device 01 may also perform other processing on the data before sending the data.
  • the data transmission device 01 may also perform scrambling, repairing, and other processing on the data before PAM encoding, and after executing PAM encoding, it may also perform parallel and serial processing on the data after PAM encoding, etc., and this application does not limit this.
  • the positions of some components in the data transmission device 01 can also be swapped.
  • the IO data distributor 13 can also be set after the PAM encoder 14, that is, the data transmission device 01 can first perform PAM-N encoding on the first encoded bit stream, and then perform IO distribution on the PAM-N encoded data.
  • the following introduces the data processing solution in the receiving end (ie, the data transmission device 02).
  • FIG. 6B is a schematic diagram of the data processing flow in the data transmission device 02 .
  • the PAM decoder 21 performs PAM-N decoding on the Q groups of PAM encoded bit streams to obtain Q groups of IO data streams.
  • the PAM decoder 21 obtains Q groups of PAM coded bit streams from the Q IO pins respectively, and the Q groups of PAM coded bit streams are sent by the data transmission device 01 to the Q IO pins.
  • the decoding method of the PAM decoder 21 corresponds to the encoding method of the PAM encoder 14 in the data transmission device 01.
  • the value of N is a positive integer greater than 2, for example, the value of N is any one of 3, 4, 6, 8 or 16.
  • the decoding method of the PAM decoder 21 is PAM-3 decoding, that is, the level value of every 2 UI in the PAM encoded bit stream is decoded into 3 bits of data.
  • the specific decoding principle can refer to the relevant content shown in Figure 3A above, which will not be repeated here.
  • each IO pin corresponds to a PAM decoder 21
  • the number of IO pins is Q
  • the number of PAM decoders 21 is also Q
  • each PAM decoder is used to perform PAM-N decoding on the IO data stream on the IO pin corresponding to the PAM decoder.
  • the lengths of each group of PAM coded bit streams in the Q groups of PAM coded bit streams are the same.
  • the first aggregator 22 aggregates the Q groups of IO data streams to obtain a first coded bit stream, where every N coded bits in the first coded bit stream constitute a unit.
  • S602B is the opposite process of S603A above. Specifically, the first aggregator 22 can reassemble the coded bits in each Q group of IO data streams into the first coded bit stream in sequence, with each consecutive N coded bits as a unit.
  • the order in which the first aggregator 22 reorganizes the units corresponds to the order in which the IO data distributor 13 in the data transmission device 01 distributes the units.
  • the order in which the IO data distributor 13 distributes the units in the first coded bit stream to the Q IO pins is the order shown in FIG9
  • the order in which the first aggregator 22 reorganizes the units in the Q groups of IO data streams into the first coded bit stream is the order shown in FIG14 .
  • the 4th to 6th bits are a unit, and so on, and so on.
  • the order in which the first aggregator 22 reorganizes the units and the order in which the IO data distributor 13 distributes the units can be specified by a protocol, or configured by other control devices, or agreed upon in advance by the data transmission device 01 and the data transmission device 02, or configured by the data transmission device 01 and notified to the data transmission device 02, or configured by the data transmission device 02 and notified to the data transmission device 01, etc., and this application does not impose any restrictions.
  • the deinterleaver 23 reads coded bits from each unit of the first coded bit stream and reorganizes them into N groups of ECC coded bit streams.
  • S603B is the opposite process to the above S602A, which may be called "de-interleaving".
  • the process of the deinterleaver 23 reading the coded bits from each unit of the first coded bit stream can also be described in a polling manner, wherein the process of the deinterleaver 23 reading a unit in the first coded bit stream and allocating each bit in the unit to the ECC coded bit stream is one round.
  • the interleaver 12 reads one unit, that is, N consecutive bits, from the first coded bit stream in each round, and then allocates the N bits to N groups of ECC coded bit streams in the reading order; when the deinterleaver 23 completes one round of reading and allocation operations, it performs the next round of reading and allocation operations until all units in the first coded bit stream are read and allocated.
  • the 1st to 3rd bits in the first coded bit stream are a unit, and the 1st to 3rd bits are respectively allocated to 3 groups of different ECC coded bit streams; the 4th to 6th bits in the first coded bit stream are a unit, and the 4th to 6th bits are respectively allocated to 3 groups of different ECC coded bit streams, ..., and this cycle is repeated until the coded bits in the first coded bit stream are all allocated, and N groups of ECC coded bit streams are obtained.
  • the coded bits in each of the N groups of ECC coded bit streams include M data bits and P check bits.
  • the order in which the deinterleaver 23 reads and allocates the coded bits in each round is the same, and this order can be called the second polling order. It can be understood that the second polling order here corresponds to the first polling order mentioned above (i.e., the order in which the interleaver 12 reads the coded bits from the N groups of ECC coded bit streams in each round).
  • the second polling order corresponds to the first coded bit in each unit (allocated to the first group of ECC coded bit streams), the second coded bit (allocated to the second group of ECC coded bit streams), and the third coded bit (allocated to the third group of ECC coded bit streams).
  • the relative positions of the coded bits read by the deinterleaver 23 in each round in the allocated ECC coded bits are the same.
  • the deinterleaver 23 reads the first unit (i.e., the first to third bits) in the first coded bit stream, and allocates the first bit (i.e., A_0) in the first unit to the first group of ECC coded bit streams, allocates the second bit (i.e., B_0) in the first unit to the second group of ECC coded bit streams, and allocates the third bit (i.e., C_0) in the first unit to the third group of ECC coded bit streams;
  • the deinterleaver 23 reads the first unit (i.e., the first to third bits ...
  • the device 23 reads the second unit (i.e., the 4th to 6th bits) in the first coded bit stream, allocates the first bit (i.e., A_1) in the second unit to the second group of ECC coded bit streams, allocates the second bit (i.e., B_1) in the second unit to the second group of ECC coded bit streams, allocates the third bit (i.e., C_1) in the second unit to the third group of ECC coded bit streams, ..., and repeats this cycle to finally obtain three groups of ECC coded bit streams: A_0/A_1/A_2..., B_0/B_1/B_2..., C_0/C_1/C_2....
  • the ECC decoder 24 performs single-bit error correction ECC decoding on the N groups of ECC encoded bit streams to obtain N groups of data bit streams.
  • the ECC decoding method used by the ECC decoder 24 corresponds to the ECC encoding method used by the ECC encoder 11 in the data transmission device 01.
  • ECC is specifically, for example, an optimal minimum odd-weight column code or other extended Hamming code for implementing SEC-DED, which is not limited in this application.
  • the ECC encoding and decoding method can be specifically specified by the protocol, or configured by other control devices, or agreed in advance by the data transmission device 01 and the data transmission device 02, or configured by the data transmission device 01 and notified to the data transmission device 02, or configured by the data transmission device 02 and notified to the data transmission device 01, etc., which is not limited in this application.
  • the number of ECC decoders 24 is N, as shown in FIG16, and the N decoders 24 are respectively used to perform single-bit error correction ECC decoding on N groups of ECC coded bit streams, and different N decoders 24 correspond to different ECCs.
  • the ECC decoder 24-1 is used to perform single-bit error correction ECC decoding on the first group of ECC coded bit streams to obtain the first group of data bit streams
  • the ECC decoder 24-2 is used to perform single-bit error correction ECC decoding on the second group of ECC coded bit streams to obtain the second group of data bit streams
  • the ECC decoder 24-N is used to perform single-bit error correction ECC decoding on the Nth group of ECC coded bit streams to obtain the Nth group of data bit streams.
  • the data transmission device 02 further includes a second aggregator 25 for aggregating the Nth group of data bit streams to obtain an original data bit stream.
  • the data transmission device 01 and the data transmission device 02 are interconnected through 64 IO pins (i.e., IO 0 to IO 63).
  • the data transmission device 01 outputs 64 groups of PAM coded bit streams
  • the data transmission device 02 receives the 64 groups of PAM coded bit streams and performs PAM-3 decoding, IO data aggregation, cross-connection, ECC decoding, etc.
  • the specific processing flow is as follows:
  • the PAM decoder 21 performs PAM-3 decoding on the 64 groups of PAM coded bit streams.
  • the PAM-3 decoding is implemented to recover 3 data bits from the received data of 2 UIs of each IO, and obtain 64 groups of IO data streams:
  • the first aggregator 22 aggregates the 64 groups of IO data streams into one group of data streams, specifically taking 3 consecutive bits as a unit, and taking 3 bits of data in sequence from IO 0 to IO 63 in the order of IO, and finally forming 384 bits of data: CC[7], CB[7], CA[7], ..., CC[0], CB[0], CA[0], DC[119], DB[119], DA[119], ..., DC[1], DB[1], DA[1], DC[0], DB[0], DA[0].
  • the deinterleaver 23 divides the data output by the first aggregator 22 into three groups of 128-bit data, specifically in bit order, with 3 bits as a unit, taking 1 bit each time to form three groups of data: ⁇ CC[7:0],DC[119:0] ⁇ , ⁇ CB[7:0],DB[119:0] ⁇ , and ⁇ CA[7:0],DA[119:0] ⁇ .
  • the 128-bit data ⁇ C[7:0], D[119:0] ⁇ received by ECC decoders A, B, and C respectively includes 120 bits of data and 8 bits of check bits.
  • Each ECC decoder determines whether the received D[119:0] data needs error correction by calculating the syndrome S:
  • H is 8*128 bits, consisting of the 8x120-bit H1 matrix and the 8x8-bit identity matrix I1:
  • S has 8 bits in total. When S is 0, it means no error has occurred, and D[119:0] is directly output as decoded data. When S is not 0 and is equal to the value of a column in the H check matrix, it means that an error has occurred in the bit corresponding to the column, and the bit is inverted to restore the original data. After ECC decoding, the corrected 120-bit data D[119:0] is output.
  • the data transmission device 02 may also include other components, and the data transmission device 02 may also perform other processing on the data.
  • the data transmission device 02 may also perform serial-to-parallel processing on the data before PAM decoding, and after PAM decoding, may also perform repair, descrambling, and other processing on the decoded data, etc., and this application does not limit this.
  • the positions of some components in the data transmission device 02 can be swapped.
  • the first aggregator 22 can also be set before the PAM decoder 21, that is, the data transmission device 02 can first aggregate the data of each IO pin, and then uniformly perform PAM-N decoding on the aggregated data.
  • the Q IO pins described above belong to the same Lane, where lane refers to a collection of all physical layer pins that share the same on-link clock source.
  • the method described above takes data transmission in one Lane as an example.
  • the data transmission device 01 and the data transmission device 02 can transmit in multiple Lanes at the same time to increase the total bit width of the entire transmission interface. It can be understood that in the case of multiple Lane transmissions, the data processing flow corresponding to each lane can refer to the method flow described above.
  • the transmitting end performs single-bit ECC encoding on N groups of data bit streams, and then interleaves the encoded N groups of ECC encoded bit streams to ensure that the N bits in a unit in the interleaved bit stream correspond to different ECCs, and performs PAM encoding with the unit as the granularity.
  • the receiving end can complete multi-bit error correction processing through a single-bit error correction method.
  • Figure 19 is a schematic diagram of error correction. Assume that an error occurs in one UI of IO_0 due to external interference, and other UIs and IOs are normal. For example, sending "010" is "101" after PAM-3 decoding at the receiving end, but these three bits correspond to ECCs respectively, and each ECC can correct 1 bit of error. Therefore, after ECC decoding, these three bits are corrected to "010", thereby achieving 3-bit error correction capability through a single-bit error correction method.
  • the embodiments of the present application can implement error correction using single-bit error correction technology in a chip interconnection system, which can achieve lower latency and smaller area power consumption compared to multi-bit error correction technology or retransmission technology.
  • an embodiment of the present application also provides a computer-readable storage medium, which is used to store instructions. When the instructions are executed, the method shown in Figure 6A or Figure 6B is implemented.
  • an embodiment of the present application further provides a computer program product, in which instructions are stored.
  • the computer program product is run on a computer, the method shown in FIG. 6A or FIG. 6B is executed.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Error Detection And Correction (AREA)

Abstract

本申请公开了一种数据传输方法、装置和系统,用于降低芯粒互联系统中的纠错处理时延,提高数据传输效率。发送端发送数据前对原始数据进行单比特纠错的ECC编码、交织、PAM-N编码等处理,接收端接收数据后对数据进行PAM-N解码、解交织、单比特纠错的ECC解码等处理,恢复原始数据。本申请可以实现在芯粒互联系统中使用单比特纠错技术进行纠错,相比多比特纠错技术或者重传技术,可以实现更低的延时和更小的面积功耗。

Description

一种数据传输方法、装置和系统 技术领域
本申请涉及电子技术领域,尤其涉及一种数据传输方法、装置和系统。
背景技术
在芯粒(Chiplet)互联系统中,芯粒间可以通过多条信道互联,并进行数据传输。对于固定的信道数量,提升单个信道的速率可以增加互联带宽,然而当达到一定速率后,继续提升速率会降低能效。信道中如果采用多电平的编码方式,可以在波特率不变的情况下增加信号传输带宽。例如,图1为三级脉冲幅度调制(Pulse Amplitude Modulation-3,PAM-3)编码的信道传输和不归零(Non-Return-to-Zero,NRZ)编码的信道传输的示意图,采用NRZ编码可以在信道中利用3个单位间隔(Unit Interval,UI)传输3比特数据,采用PAM-3编码可以在信道中利用2个UI传输3比特数据,PAM-3编码相比于NRZ编码可以将互联带宽增加到1.5倍。
多电平的编码方式中,信道中发生1UI的原始误码,解码后可能产生多比特的误码。例如,对于PAM-3编码,信道中每2个UI传输3比特数据,如果信道中发生1UI的原始误码,则解码后可能产生2~3比特的误码。因此接收端需要通过纠错技术进行纠错,以减小互联接口的误码率。
然而,现有技术一般采用多比特纠错技术或者重传技术进行纠错,纠错处理延时大,影响数据传输效率。
发明内容
本申请提供一种数据传输方法、装置和系统,用于降低芯粒互联系统中的纠错处理时延,提高数据传输效率。
第一方面,提供一种数据传输装置,该装置可以是发送端或者该装置位于发送端,该装置包括:ECC编码器,用于对N组数据比特流进行单比特纠错的ECC编码,得到N组ECC编码比特流;其中,N组数据比特流中每组数据比特流包括M个数据比特,N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特;交织器,用于通过轮询的方式从N组ECC编码比特流中读取编码比特,重组为第一编码比特流;其中,第一编码比特流中每N个编码比特为一个单元,每个单元中的N个编码比特分别来自不同的ECC编码比特流;IO数据分发器,用于将第一编码比特流中的各个单元分发到Q个IO管脚,得到Q组IO数据流;PAM编码器,分别用于对Q组IO数据流进行PAM-N编码,得到Q组PAM编码比特流;Q个IO管脚,分别用于发送Q组PAM编码比特流;其中,N、M、P、Q为正整数。
本申请实施例中,发送端对N组数据比特流进行单比特的ECC编码,然后对编码后的N组ECC编码比特流进行交织,确保交织后的比特流中一个单元中的N个比特分别对应不同的ECC,并以单元为粒度进行PAM编码,这样,当某个单元中发生误码时,错误会分散到不同的ECC上,使得接收端可以通过单比特的纠错方式完成多比特的纠错处理。相比多比特纠错技术或者重传技术,发送端可以实现更低的处理延时和更小的面积功耗。
一种可能的实现方式中,该装置还可以包括数据分组器,用于对原始数据比特流进行分组,得到N组数据比特流;将N组数据比特流分别传输给ECC编码器。
通过该方式,可以实现将原始数据比特流分组为N组数据比特流,实现对N组数据比特流以不同的ECC进行编码,提高了方案的可靠性。
一种可能的实现方式中,Q组IO数据流中各个IO数据流的数据长度相同。换而言之,IO数据分发器将第一编码比特流中的各个单元均匀地分发到Q个IO管脚。
通过该方式,可以提高数据处理的并行度,进一步提升数据处理效率。
一种可能的实现方式中,单比特纠错的ECC编码包括用于实现SEC-DED的扩展汉明码,例如最优最小奇数权列码。当然,此处仅为举例,还可以有其它实现SEC-DED的扩展汉明码。
一种可能的实现方式中,N的取值为3、4、6、8或16中的任一个。
通过该方式,可以在PAM-3、PAM-4、PAM-8、PAM-16等PAM场景下实现单比特的ECC编码。
第二方面,提供一种数据传输装置,该装置可以是接收端或者该装置位于接收端,该装置包括:Q个IO管脚,分别用于接收Q组PAM编码比特流;PAM解码器,用于对Q组PAM编码比特流进行PAM-N解码,得到Q组IO数据流;第一汇聚器,用于对Q组IO数据流进行汇聚,得到第一编码比特流,第一编码比特流中每N个编码比特为一个单元;解交织器,用于从第一编码比特流的各个单元中读取编码比特,重组为N组ECC编码比特流;其中,每个单元中的N个编码比特分别被分配到不同的ECC编码比特流,N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特;ECC解码器,用于对N组ECC编码比特流进行单比特纠错的ECC解码,得到N组数据比特流;其中,N、M、P、Q为正整数。
本申请实施例中,接收端对Q组PAM编码比特流进行PAM-N解码,并对解码后的IO数据流进行汇聚和交织处理,使得同一单元中的不同编码比特分散到不同的ECC上,当某个单元中发生误码时,错误会分散到不同的ECC上,接收端可以使用单比特的纠错方式完成多比特的纠错处理。相比多比特纠错技术或者重传技术,接收端可以实现更低的处理延时和更小的面积功耗。
一种可能的实现方式中,该装置还可以包括:第二汇聚器,用于将N组数据比特流汇聚为原始数据比特流。
通过该实现方式,可以恢复出原始数据比特流,提高了方案的可靠性。
一种可能的实现方式中,Q组IO数据流中各个IO数据流的数据长度相同。
一种可能的实现方式中,单比特纠错的ECC编码包括用于实现SEC-DED的扩展汉明码,例如最优最小奇数权列码。
一种可能的实现方式中,N的取值为3、4、6、8或16中的任一个。
第三方面,提供一种数据传输方法,包括:对N组数据比特流进行单比特纠错的ECC编码,得到N组ECC编码比特流;其中,N组数据比特流中每组数据比特流包括M个数据比特,N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特;通过轮询的方式从N组ECC编码比特流中读取编码比特,重组为第一编码比特流;其中,第一编码比特流中每N个编码比特为一个单元,每个单元中的N个编码比特分别来自不同的ECC编码比特流;将第一编码比特流中的各个单元分发到Q个IO管脚, 得到Q组IO数据流;对Q组IO数据流进行PAM-N编码,得到Q组PAM编码比特流;通过Q个IO管脚发送Q组PAM编码比特流;其中,N、M、P、Q为正整数。
一种可能的实现方式中,方法还包括:对原始数据比特流进行分组,得到N组数据比特流。
一种可能的实现方式中,Q组IO数据流中各个IO数据流的数据长度相同。
一种可能的实现方式中,单比特纠错的ECC编码包括用于实现SEC-DED的扩展汉明码,例如最优最小奇数权列码。
一种可能的实现方式中,N的取值为3、4、6、8或16中的任一个。
第四方面,提供一种数据传输方法,包括:通过Q个IO管脚接收Q组PAM编码比特流;对Q组PAM编码比特流进行PAM-N解码,得到Q组IO数据流;对Q组IO数据流进行汇聚,得到第一编码比特流,第一编码比特流中每N个编码比特为一个单元;从第一编码比特流的各个单元中读取编码比特,重组为N组ECC编码比特流;其中,每个单元中的N个编码比特分别被分配到不同的ECC编码比特流,N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特;对N组ECC编码比特流进行单比特纠错的ECC解码,得到N组数据比特流;其中,N、M、P、Q为正整数。
一种可能的实现方式中,方法还包括:将N组数据比特流汇聚为原始数据比特流。
一种可能的实现方式中,Q组IO数据流中各个IO数据流的数据长度相同。
一种可能的实现方式中,单比特纠错的ECC编码包括用于实现SEC-DED的扩展汉明码,例如最优最小奇数权列码。
一种可能的实现方式中,N的取值为3、4、6、8或16中的任一个。
第五方面,提供一种计算机可读存储介质,可读存储介质用于存储指令,当指令被执行时,使如第三方面或第三方面任一种可能的实现方式中所述的方法被实现,或者使如第四方面或第四方面任一种可能的实现方式中所述的方法被实现。
第六方面,提供一种计算机程序产品,计算机程序产品中存储有指令,当其在计算机上运行时,使得如第三方面或第三方面任一种可能的实现方式中所述的方法被执行,或者使得如第四方面或第四方面任一种可能的实现方式中所述的方法被执行。
第七方面,提供一种数据传输系统,包括如第一方面或第一方面任一种可能的实现方式中所述的装置和如第二方面或第二方面任一种可能的实现方式中所述的装置。
附图说明
图1为PAM-3编码的信道传输和NRZ编码的信道传输的示意图;
图2为一种NRZ编码的示意图;
图3A为一种PAM-3编码的示意图;
图3B为单个UI的错误导致PAM-3解码后多个比特的错误的示意图;
图4为本申请实施例提供的一种应用场景的示意图;
图5为本申请实施例提供的一种数据传输系统的示意图;
图6A为数据传输装置01中的数据处理流程的示意图;
图6B为数据传输装置02中的数据处理流程的示意图;
图7为本申请实施例提供的一种数据传输装置01的示意图;
图8为本申请实施例提供的一种数据交织的示意图;
图9为本申请实施例提供的一种IO数据分发的示意图;
图10为本申请实施例提供的另一种数据传输装置01的示意图;
图11为本申请实施例提供的另一种数据传输装置01的示意图;
图12为一个具体的数据处理示例的示意图;
图13为本申请实施例提供的一种数据传输装置02的示意图;
图14为本申请实施例提供的一种IO数据汇聚的示意图;
图15为本申请实施例提供的一种数据解交织的示意图;
图16为本申请实施例提供的另一种数据传输装置02的示意图;
图17为本申请实施例提供的另一种数据传输装置02的示意图;
图18为一个具体的数据处理示例的示意图;
图19为一个具体的单比特纠错示例的示意图。
具体实施方式
为了便于理解本申请实施例技术方案,以下先对本文涉及到部分技术术语进行介绍。
1)、脉冲振幅调制(Pulse Amplitude Modulation,PAM):是脉冲载波的幅度随基带信号变化的一种调制方式。按照调制级别划分,PAM分为多种。为例便于描述,在本文中,可以用PAM-N表示N级的脉冲幅度调制,N为正整数。级别N可以表示有多少种电平值,例如PAM-3中,有高、中、低三种电平值。
2)、不归零(Non-Return-to-Zero,NRZ)编码:又称二级脉冲幅度调制(Pulse Amplitude Modulation-2,PAM-2)编码,使用两个电压电平来表示逻辑0和逻辑1,例如正电平表示1、低电平表示0的编码方式。它与归零RZ(Return-to-zero)编码的区别就是它不用归零,也就是说,一个周期可以全部用来传输数据,这样传输的带宽就可以完全利用。传统的数字信号多采用NRZ信号,即采用两种信号电平来表示数字逻辑信号的1、0信息,每个符号周期(或称为UI)可传输1bit的逻辑信息。例如,图2为一种NRZ编码的示意图。
3)、三级脉冲幅度调制(Pulse Amplitude Modulation-3,PAM-3)编码:使用2个三进制电平值表示3比特的逻辑信息,即每两个符号周期可以传输3比特的逻辑信息。例如,图3A为一种PAM-3编码的示意图,3个比特有8种组合(即2的三次方),两个电平值有9种组合(即3的平方),所以后者可以覆盖前者,因此两个电平能够表示3比特的逻辑信息,每个电平实际上包含了1.5比特信息。
当然,除了PAM-3编码之外,还包括PAM-4编码、PAM-6编码、PAM-8编码等多级PAM编码方式,这里不再一一展开介绍。
4)芯粒(Chiplet),又称芯粒或者小芯片,是指预先制造好、具有特定功能、可组合集成的晶片(Die)。将一类满足特定功能的裸片(die),通过晶粒间(die-to-die)内部互联技术实现多个模块芯片与底层基础芯片封装在一起,可以形成一个系统芯片,芯粒间可以通过多条信道(信道的具体实现方式可以是输入输出(Input Output,IO或I/O)管脚)互联,并进行数据传输,在提升性能的同时实现低成本和高良率。
5)、本申请实施例中的术语“系统”和“网络”可被互换使用。“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或” 的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合,例如a、b或c中的至少一项(个),可以表示:a,或b,或c,或a和b,或b和c,或a和c,或a和b和c。
以及,除非有相反的说明,本申请实施例提及“第一”、“第二”等序数词是用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。例如,第一优先级准则和第二优先级准则,只是为了区分不同的准则,而并不是表示这两种准则的内容、优先级或者重要程度等的不同。
此外,本申请实施例和权利要求书及附图中的术语“包括”和“具有”不是排他的。例如,包括了一系列步骤或模块的过程、方法、系统、产品或设备,不限定于已列出的步骤或模块,还可以包括没有列出的步骤或模块。
参见图4,为本申请实施例提供的一种应用场景的示意图。该场景示意了一种芯粒互联系统,包括至少两个芯粒,可以理解,图4示意了两个芯粒,即芯粒A和芯粒B,但实际不限于此。芯粒之间可以通过一条或多条信道互联并传输数据。例如,图4是以芯粒A向芯粒B发送数据为例,即芯粒A为发送端,芯粒B为接收端。
可以理解,芯粒间的互联信道是基于封装内的信道,相比片间互联信道可做到更低的误码率,业界普遍定义纠错前的比特出错概率(Bit Error Ratio,BER)小于1e-15。芯粒互联系统中的传输延时影响系统的总体性能,对误码进行纠错时应减小对传输延时的影响,需要使用尽量小的实现代价提升互联传输的误码率。
芯粒互联系统中,可以采用多电平的编码方式对数据进行编码,以提高信道互联带宽。但是在多电平的编码方式下,若信道中发生1UI的原始误码,解码后可能产生多比特的误码。例如,对于PAM-3编码,信道中每2个UI传输3比特数据,如果信道中发生1UI的原始误码,则解码后可能产生2~3比特的误码。
参见图3B,为单个UI的错误导致PAM-3解码后多个比特的错误的示意图。接收端接采样每个UI的电平,00表示低电平(-1),01表示中间电平(0),11表示高电平(1),连续两个UI的电平可恢复出3个数据比特。例如发送端发送的原始数据为010,正确的2个UI的电平应该是00(低电平)01(中间电平),假设第一个UI的电平采样错误变为中间电平01,即接收到两个UI的中间电平,此时与发送“101”的码型是重合的,PAM-3会解码为“101”,与正确的“010”数据就有3个比特的错误。
因此,接收端需要进行纠错,以减小互联接口的误码率。
一种实现方式中,可以使用里德-所罗门码(Reed-Solomon,RS)(450,406)对芯粒互联系统中的数据进行检错和纠错。RS编码技术中,以9个比特作为一个信号(symbol),406symbol的原始数据增加44个symbol的校验位,支持多达22个symbol错误的纠错。在发送端,45个81比特的数据块和1个9比特的操作维护管理(Operations Administration and Maintenance,OAM)信息组成406个symbol,经过RS(450,406)编码处理后,增加44个symbol,共产生450个symbol的数据。在接收端,450个symbol数据经过RS(450,406)解码处理后,输出纠错后的406个symbol的原始数据,恢复45个81比特数据块和1个9比特的OAM。
该实现方式中,RS编解码处理复杂,导致处理延时大(几十ns量级),无法满足芯粒接口互联的低延时要求。
另一种实现方式中,可以使用BCH码(Bose-Chaudhuri-Hocquenghem codes)对芯粒 互联系统中的数据进行检错和纠错。BCH编码BCH编码常用于纠正多个随机错误,可根据要纠错的比特数和信息比特长度来设计。对于BCH(n,k,t)编码,n为编码字长度,t为纠错比特数,k为信息比特数,n=2^m-1,需要添加的校验位n-k=mt。其中,m的值用于推算本原BCH码(码长为n=的BCH码称为本原BCH码)中需要添加的校验位,以及非本原BCH码(码长为因子的BCH码称为非本原BCH码)的码长。例如:本原BCH码实例:BCH(8191,7151,80)表示码长n=8191,信息位k=7151,可纠错位t=80。已知n,t,本原BCH码,推算t的过程可以为:由于8191=2^13–1即m=13;需要添加的校验位n-k=mt=13*80=1040;有效信息位k=n–1040=8191–1040=7151;非本原BCH码实例:BCH(9312,8192,80),表示码长n=9312,信息位k=8192,可纠错位t=80。已知k,t,非本原BCH码,推算n的过程为:先确定m,因为k=8192=2**13,一般而言n-k<k,且n<=2**m-1,可推算出m=14,需要添加的校验位n-k=mt=14*80=1120;有效信息位n=k+1120=8192+1120=9312。
对于基于PAM-3编码的芯粒互联系统,至少要能纠正2比特错误。例如,BCH(255,239,2),可纠错2比特,信息比特数为239,m=8,添加的校验位为255-239=8*2=16。
在该实现方式中,至少需要纠正2比特错误,BCH解码处理复杂,处理延时仍然较大(ns量级)。
另一种实现方式中,当数据传输出错后,可以采用数据重传的方式进行纠错。在发送端,对数据添加序列号和循环冗余校验码(Cyclic Redundancy Check,CRC),并把已发送的数据存放在发送缓存中;接收端接收数据后通过CRC校验数据比特,当数据正确后发送确认(ACK)回应给发送端,发送端释放发送缓存中的数据;当数据出错后,接收端发送对应序列号的否认(NACK)回应给发送端,并携带对应的序列号信息,发送端重新发送该序列号对应的报文。
该实现方式中,每个发送报文都需要缓存,需要接收端实时返回ACK/NAK信息,且必须使用双向链路,并且占用反向通路的传输带宽,实现的代价和功耗大;并且,在数据出错时,需要发送端重传才能纠正错误,处理时延仍然较大。
可见,采用多比特纠错技术或者重传技术进行纠错,处理延时大,无法满足芯粒接口互联的低延时要求。
鉴于此,提供本申请实施例技术方案,用于实现在芯粒互联系统中使用单比特纠错技术进行纠错,相比多比特纠错技术或者重传技术,具有更低的延时和更小的面积功耗。
参见图5,为本申请实施例提供的一种数据传输系统的示意图,该系统包括数据传输装置01和数据传输装置02。
数据传输装置01为发送端或者为位于发送端的装置,例如为图4所示场景中的芯粒A,数据传输装置01中包括错误校正码(Error-Correcting Codes,ECC)编码器11、交织器12、IO数据分发器13和PAM编码器14。
数据传输装置02为接收端或者为位于接收端的装置,例如为图4所示场景中的芯粒B。数据传输装置02中包括PAM解码器21、第一汇聚器22、解交织器23、ECC解码器24。
数据传输系统还包括多个IO管脚,每个IO管脚包括多个部分,一部分分布在数据传输装置01中,一部分分布在数据传输装置02中,还有一部分连接在数据传输装置01和数据传输装置02之间,作为数据传输装置01和数据传输装置02的互联信道,用于传输数据传输装置01发送给数据传输装置02的数据。
数据传输装置01用于对原始数据进行单比特的ECC编码、交织和PAM-N等处理,然后将处理后的数据流通过信道传输给数据传输装置02。数据传输装置用于对接收到的数据进行PAM-N解码、解交织和单比特的ECC解码等处理,恢复出原始数据。
以下介绍数据传输装置01中的数据处理方案。
参见图6A,为数据传输装置01中的数据处理流程的示意图。
S601A、ECC编码器11对N组数据比特流进行单比特纠错的ECC编码,得到N组ECC编码比特流。
其中,N的取值为大于2的正整数,例如N的取值为3、4、6、8或16中的任一个。
N组数据比特流中每组数据比特流包括M个数据比特,N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特。
可以理解,步骤S601A中的数据比特流的组数(即N)和步骤S604A中PAM编码器14的编码方式相对应。例如,PAM编码器14的编码方式为PAM-3编码,则N=3,PAM编码器14的编码方式为PAM-4编码,则N=4。
可选的,单比特纠错的ECC编码包括用于实现单纠错和双检错(Single-error correction and double-error detecting,SEC-DED)的扩展汉明码。例如,最优最小奇数权列码,或者其它用于实现SEC-DED的扩展汉明码,本申请不做限制。以最优最小奇数权列码为例,编码方式为PAM-3编码,每组数据比特流包括120个数据比特,则可以为每组数据比特流8比特的校验比特,每组ECC编码比特流输出128个比特。
在本申请实施例中,不同ECC编码比特流(或者说数据比特流)对应不同的纠错码,即ECC(具体例如最优最小奇数权列码)。
可选的,ECC编码器11的数量为N个,如图7所示,N个ECC编码器11分别用于对N组数据比特流进行单比特纠错的ECC编码,不同ECC编码器11使用的ECC不同,实现不同ECC编码比特流对应不同的ECC。例如,ECC编码器11-1用于对第1组数据比特流进行单比特纠错的ECC编码,得到第1组ECC编码比特流,ECC编码器11-2用于对第2组数据比特流进行单比特纠错的ECC编码,得到第2组ECC编码比特流,…,ECC编码器11-N用于对第N组数据比特流进行单比特纠错的ECC编码,得到第N组ECC编码比特流。
S602A、交织器12通过轮询的方式从N组ECC编码比特流中读取编码比特,重组为第一编码比特流。
可以理解,交织器12通过轮询的方式从N组ECC编码比特流中读取编码比特,是指对N组ECC编码比特流执行多伦读取操作,其中交织器12对N组ECC编码比特流中的每组ECC编码比特流都执行一次读取操作的过程为一轮。具体的,交织器12每一轮从N组ECC编码比特流中的每组ECC编码比特流中读取1比特的数据,即每一轮共读取N比特,该N比特分别来自N组ECC编码比特流;当交织器12执行完一轮读取操作后,再执行下一轮读取操作,直至将N组ECC编码比特流中的所有编码比特读完为止。
可选的,交织器12每一轮从N组ECC编码比特流中读取编码比特的顺序相同,该顺序可以称为第一轮询顺序。以N=3为例,例如第一轮询顺序为第1组ECC编码比特流、第2组ECC编码比特流、第3组ECC编码比特流,当然此处仅为举例而非限定。
由于每一轮读取到的N比特分别来自N组ECC编码比特流,因此第一编码比特流中可以按照比特顺序以每N个编码比特为一个单元,每个单元中的N个编码比特分别来自不 同的ECC编码比特流。
以N=3为例,第1~3个比特为一个单元(第1~3个比特分别来自3组ECC编码比特流),第4~6个比特为一个单元(第4~6个比特分别来自3组ECC编码比特流),…,以此循环。
进一步可选的,交织器12每一轮读取到的N个编码比特在其所属的ECC编码比特中的相对位置相同。
参见图8,以N=3,且第一轮询顺序为第1组ECC编码比特流、第2组ECC编码比特流、第3组ECC编码比特流为例:第1组ECC编码比特流为A_0/A_1/A_2…,第2组ECC编码比特流为B_0/B_1/B_2…,第3组ECC编码比特流为C_0/C_1/C_2…….。第1轮,交织器12依次读取第1组ECC编码比特流中的第1个比特(即A_0)、第2组ECC编码比特流中的第1个比特(即B_0)、第3组ECC编码比特流中的第1个比特(即C_0);第2轮,交织器12依次读取第1组ECC编码比特流中的第2个比特(即A_1)、第2组ECC编码比特流中的第2个比特(即B_1)、第3组ECC编码比特流中的第2个比特(即C_1),…,以此循环,最终得到第一编码比特流为A_0/B_0/C_0/A_1/B_1/C_1…A_N/B_N/C_N。
可以理解,交织器将N组ECC编码比特流中读取编码比特重组为第一编码比特流的过程,实质上是对编码比特重新排列的过程,因此该过程也可以描述为“交织”。
S603A、IO数据分发器13将第一编码比特流中的各个单元分发到Q个IO管脚,得到Q组IO数据流。
可以理解,IO数据分发器13是以单元为粒度,将第一编码比特流中比特进行分发,这样,可以保证分发到Q个IO管脚中N个比特(即每个单元中的N个比特)分别来自不同的ECC编码比特数据流,分别对应不同的ECC。
延用图8所示的第一编码比特流,图9为第一编码比特流中的各个单元分发到Q个IO管脚的示意图。
可选的,IO数据分发器13将第一编码比特流中的各个单元均匀地分发到Q个IO管脚,换而言之,Q组IO数据流中各个IO数据流的数据长度相同。这样,可以提高数据处理的并行度,进一步提升数据处理效率。
S604A、PAM编码器14对Q组IO数据流进行PAM-N编码,得到Q组PAM编码比特流。
具体的,PAM编码器14以单元为例粒度进行编码,以N=3为例,每个单元包括3个比特,分别来自三个不同的ECC编码数据流,分别对应不同的ECC,采用PAM-3编码方式,可以将每个单元的3个比特编码通过使用2个三进制电平值来表示,具体编码原理可以参考前文图3A所示相关内容,此处不再赘述。
可选的,如图10所示,每个IO管脚对应一个PAM编码器14,即IO管脚数量为Q个,PAM编码器14的数量也为Q个,每个PAM编码器用于对该PAM编码器对应的IO管脚上的IO数据流进行PAM-N编码。
最后,Q组PAM编码比特流分别通过Q个IO管脚发送出去。
可以理解,N、M、P、Q均为正整数。
可选的,如图11所示,数据传输装置01还包括数据分组器15,用于对原始数据比特流进行分组,得到N组数据比特流;并将N组数据比特流传输给ECC编码器11。
为了更加清楚地理解上述处理流程,这里再例举一个完整示例:
以120*3=360比特的原始数据为例,数据传输装置01和数据传输装置02之间通过64个IO管脚(即IO 0~IO 63)互联,数据传输装置01在发送数据前需要对原始数据进行分组、ECC编码、交织、IO数据分发和PAM-3编码等处理。参见图12,具体处理流程如下:
数据分组器15将360比特的原始数据(图12中表示为D[120*3-1:0],指从0到120*3-1共360比特的数据)平均分成3组,每组120比特,分别为DA[119:0],DB[119:0],DC[119:0],每个分组数据对应一个ECC编码器11,分别对应ECC编码器A、ECC编码器B、ECC编码器C。
每个ECC编码器11以最小奇数权列码作为ECC编码算法,对每组输入的120比特数据(D[119:0])添加8比特的校验比特(C[7:0]),一共输出128比特数据。其中,ECC编码的计算公式为:{C[7:0],D[119:0]}=D[119:0]*G,其中G为120*128位的矩阵,由120*120的单位矩阵I2和8*120的H1的转置矩阵组成,即:
G=[I2,H1 T];
Figure PCTCN2022128727-appb-000001
Figure PCTCN2022128727-appb-000002
交织器12将3个ECC编码器11输出的3组128比特数据({CC[7:0],DC[119:0]},{CB[7:0],DB[119:0]},{CA[7:0],DA[119:0]})重组为一组并行数据,即第一编码比特流。具体交织方法为轮询三组数据,每次从组内取1个比特,照此类推,最终输出的数据为:CC[7],CB[7],CA[7],…,CC[0],CB[0],CA[0],DC[119],DB[119],DA[119],…,DC[1],DB[1],DA[1],DC[0],DB[0],DA[0]。
IO数据分发器13将交织后的数据以连续3比特为单位,按比特顺序依次分发到64个IO管脚中,每个IO管脚传输6比特的IO数据。
例如:
IO 0:DC[64],DB[64],DA[64],DC[0],DB[0],DA[0];
IO 1:DC[65],DB[65],DA[65],DC[0],DB[0],DA[0];
……
IO 63:CC[7],CB[7],CA[7],DC[63],DB[63],DA[63]。
可以理解,以上分发方式仅为示例而非具体限定。
PAM-3编码器14对每个IO管脚中的数据进行PAM-3编码,实现3个数据比特使用2个UI进行传输,得到64组PAM编码比特流,每组PAM编码比特流包括4个比特。
需要说明的是,以上仅仅是对数据传输装置01中的几个关键器件和其执行的方法进行了说明,实际应用中,数据传输装置01还可能包括其它器件,数据传输装置01在发送数据之前,还可能对数据进行其它处理。例如,数据传输装置01在执行PAM编码之前,还可以对PAM编码前的数据进行加扰、修复等处理,在执行PAM编码之后,还可以对PAM编码后的数据进行并串处理,等等,本申请不做限制。
此外,实际应用中,数据传输装置01中的部分器件的位置还可以调换,例如,IO数据分发器13还可以设置在PAM编码器14之后,即数据传输装置01可以先对第一编码比特流进行PAM-N编码,再对PAM-N编码后的数据进行IO分发。
以下介绍接收端(即数据传输装置02)中的数据处理方案。
参见图6B,为数据传输装置02中的数据处理流程的示意图。
S601B、PAM解码器21对Q组PAM编码比特流进行PAM-N解码,得到Q组IO数据流。
具体的,PAM解码器21分别从Q个IO管脚获取到Q组PAM编码比特流,该Q组PAM编码比特流是数据传输装置01发送到该Q个IO管脚的。
S601B是与上文S604A相反的过程。PAM解码器21的解码方式与数据传输装置01中PAM编码器14的编码方式相对应。N的取值为大于2的正整数,例如N的取值为3、4、6、8或16中的任一个。例如,PAM编码器14的编码方式为PAM-3编码,则PAM解码器21的解码方式为PAM-3解码,即将PAM编码比特流中每2个UI的电平值解码为3个比特的数据,具体解码原理可以参考前文图3A所示相关内容,此处不再赘述。
可选的,如图13所示,每个IO管脚对应一个PAM解码器21,IO管脚数量为Q个,PAM解码器21的数量也为Q个,每个PAM解码器用于对该PAM解码器对应的IO管脚上的IO数据流进行PAM-N解码。
可选的,Q组PAM编码比特流中各组PAM编码比特流的长度相同。
S602B、第一汇聚器22对Q组IO数据流进行汇聚,得到第一编码比特流,第一编码比特流中每N个编码比特为一个单元。
S602B是与上文S603A相反的过程。具体的,第一汇聚器22可以将每个Q组IO数据流中的编码比特以每连续N个编码比特为一个单元,按照顺序将Q组IO数据流中的各个单元重组为第一编码比特流。
可以理解,这里第一汇聚器22重组各单元的顺序,与数据传输装置01中IO数据分发器13分发各个单元的顺序相对应。例如,当IO数据分发器13将第一编码比特流中的各个单元分发到Q个IO管脚的顺序为图9所示的顺序,第一汇聚器22将Q组IO数据流中的各个单元重组为第一编码比特流的顺序则为图14所示的顺序。以保证第一汇聚器22输出的第一编码比特流的比特顺序和IO数据分发器13输入的第一编码比特流的比特顺序相同。以N=3为例,第一编码比特流中第1~3个比特为一个单元,第4~6个比特为一个单元,…,以此循环。
可以理解,第一汇聚器22重组各单元的顺序和IO数据分发器13分发各单元的顺序, 可以由协议规定,或者其它控制装置配置,或者数据传输装置01和数据传输装置02提前约定,或者由数据传输装置01配置并告知数据传输装置02,或者由数据传输装置02配置并告知数据传输装置01,等等,本申请不做限制。
S603B、解交织器23从第一编码比特流的各个单元中读取编码比特,重组为N组ECC编码比特流。
S603B是与上文S602A相反的过程,该过程可以称为“解交织”。
可以理解,为了和上文S602A对应以及便于理解,这里也可以用轮询的方式来描述解交织器23从第一编码比特流的各个单元中读取编码比特的过程,其中解交织器23读取第一编码比特流中的一个单元并将该单元中的各个比特分配到ECC编码比特流的过程为一轮。具体来说,交织器12每一轮从第一编码比特流中读取一个单元,即连续的N个比特,然后将该N个比特按照读取顺序分别分配到N组ECC编码比特流;当解交织器23执行完一轮读取和分配的操作后,再执行下一轮读取和分配的操作,直至读取并分配完第一编码比特流中的所有单元为止。
以N=3为例,第一编码比特流中第1~3个比特为一个单元,第1~3个比特分别分配到3组不同的ECC编码比特流;第一编码比特流中第4~6个比特为一个单元,第4~6个比特分别分配到3组不同的ECC编码比特流,…,以此循环,直至第一编码比特流中的编码比特被分配完,得到N组ECC编码比特流。
相对应的,N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特。
可选的,解交织器23每一轮读取和分配编码比特的顺序相同,该顺序可以称为第二轮询顺序。可以理解,这里的第二轮询顺序和上文中的第一轮询顺序(即交织器12每一轮从N组ECC编码比特流中读取编码比特的顺序)相对应。例如,以N=3为例,例如第一轮询顺序为第1组ECC编码比特流、第2组ECC编码比特流、第3组ECC编码比特流,则第二轮询顺序对应为每个单元中的第1个编码比特(被分配至第1组ECC编码比特流)、第2个编码比特(被分配至第2组ECC编码比特流)、第3个编码比特(被分配至第3组ECC编码比特流)。
进一步可选的,解交织器23每一轮读取到的各个编码比特在其被分配的ECC编码比特中的相对位置相同。
参见图15,以N=3,且第二轮询顺序为每个单元中的第1个编码比特、第2个编码比特、第3个编码比特为例:第1轮,解交织器23读取第一编码比特流中的第1个单元(即第1~3比特),将第1个单元中的第1个比特(即A_0)分配到第1组ECC编码比特流中,将第1个单元中的第2个比特(即B_0)分配到第2组ECC编码比特流中,将第1个单元中的第3个比特(即C_0)分配到第3组ECC编码比特流中;第2轮,解交织器23读取第一编码比特流中的第2个单元(即第4~6比特),将第2个单元中的第1个比特(即A_1)分配到第2组ECC编码比特流中,将第2个单元中的第2个比特(即B_1)分配到第2组ECC编码比特流中,将第2个单元中的第3个比特(即C_1)分配到第3组ECC编码比特流中,…,以此循环,最终得到三组ECC编码比特流分别为:A_0/A_1/A_2…,B_0/B_1/B_2…,C_0/C_1/C_2….。
S604B、ECC解码器24对N组ECC编码比特流进行单比特纠错的ECC解码,得到N组数据比特流。
S604B是与上文S601A相反的过程。ECC解码器24使用的ECC解码方式和数据传输装置01中ECC编码器11使用的ECC编码方式相对应。ECC具体例如是最优最小奇数权列码或者其它用于实现SEC-DED的扩展汉明码,本申请不做限制。ECC编解码方式具体可以由协议规定,或者其它控制装置配置,或者数据传输装置01和数据传输装置02提前约定,或者由数据传输装置01配置并告知数据传输装置02,或者由数据传输装置02配置并告知数据传输装置01,等等,本申请不做限制。
由于每个单元中的N个比特分别对应不同的ECC,所以当某个单元出现多个比特的错误后,这些错误会分散到不同的ECC,可以实现基于单比特纠错技术对多比特的错误进行纠错。
可选的,ECC解码器24的数量为N个,如图16所示,N个解码器24分别用于对N组ECC编码比特流进行单比特纠错的ECC解码,不同N个解码器24对应不同的ECC。例如,ECC解码器24-1用于对第1组ECC编码比特流进行单比特纠错的ECC解码,得到第1组数据比特流,ECC解码器24-2用于对第2组ECC编码比特流进行单比特纠错的ECC解码,得到第2组数据比特流,…,ECC解码器24-N用于对第N组ECC编码比特流进行单比特纠错的ECC解码,得到第N组数据比特流。
可选的,如图17所示,数据传输装置02还包括第二汇聚器25,用于对第N组数据比特流进行汇聚,得到原始数据比特流。
为了更加清楚地理解上述处理流程,这里再例举一个完整示例:
沿用图12的例子,原始数据有120*3=360比特,数据传输装置01和数据传输装置02之间通过64个IO管脚(即IO 0~IO 63)互联,数据传输装置01输出64组PAM编码比特流,数据传输装置02收到该64组PAM编码比特流后进行PAM-3解码、IO数据汇聚、结交之、ECC解码等处理。参见图18,具体处理流程如下:
PAM解码器21对64组PAM编码比特流进行PAM-3解码,PAM-3解码实现从每个IO的2个UI的接收数据中恢复出3个数据比特,得到64组IO数据流:
IO 0:DC[64],DB[64],DA[64],DC[0],DB[0],DA[0];
IO 1:DC[65],DB[65],DA[65],DC[0],DB[0],DA[0];
……
IO 63:CC[7],CB[7],CA[7],DC[63],DB[63],DA[63]。
第一汇聚器22将64组IO数据流汇聚为一组数据流,具体以连续3比特为一个单元,按照IO的顺序从IO 0到IO 63依次取3比特数据,最终组成384比特的数据:CC[7],CB[7],CA[7],…,CC[0],CB[0],CA[0],DC[119],DB[119],DA[119],…,DC[1],DB[1],DA[1],DC[0],DB[0],DA[0]。
解交织器23把第一汇聚器22输出的数据分为3组128比特的数据,具体按照比特的顺序,以3比特为单位,每次取1个比特,组成{CC[7:0],DC[119:0]},{CB[7:0],DB[119:0]},{CA[7:0],DA[119:0]}三组数据。
ECC解码器A、B、C分别接收到的128比特数据{C[7:0],D[119:0]}中,包括120比特数据和8比特校验比特。每个ECC解码器通过计算综合征S判断接收的D[119:0]数据是否需要纠错:
S={C[7:0],D[119:0]}*HT;
其中,H为8*128位,由8x120位H1矩阵和8x8位单位矩阵I1组成:
H=[H1,HI 1];
Figure PCTCN2022128727-appb-000003
Figure PCTCN2022128727-appb-000004
S总共有8位。当S为0时,表示没有发生错误,D[119:0]直接作为解码数据输出。当S不为0,且等于H校验矩阵中某一列的值时,表示该列对应的位发生了错误,将该位反转以恢复原始数据。ECC解码后,输出纠正后的120比特数据D[119:0]。
第二汇聚器25将3组纠错后的120比特数据组成360比特的数据,按照顺序组成360比特的原始数据,即D[120*3-1:0]={DC[119:0],DB[119:0],DA[119:0]}。
需要说明的是,以上仅仅是对数据传输装置02中的几个关键器件和其执行的方法进行了说明,实际应用中,数据传输装置02还可能包括其它器件,数据传输装置02还可能对数据进行其它处理。例如,数据传输装置02在执行PAM解码之前,还可以对PAM解码前的数据进行串并处理,在PAM解码之后,还可以对解码后的数据进行修复、解扰等处理,等等,本申请不做限制。
此外,实际应用中,数据传输装置02中的部分器件的位置还可以调换,例如,第一汇聚器22还可以设置在PAM解码器21之前,即数据传输装置02可以先将各IO管脚数据进行汇聚之后,再统一对汇聚后的数据进行PAM-N解码。
可选的,上文所述的Q个IO管脚,属于同一个Lane,其中lane是指共享相同随路时钟源的所有物理层管脚的集合。换而言之,上文所描述的方法是以一个Lane的数据传输为例。但在实际应用中,数据传输装置01和数据传输装置02可以同时以多个Lane进行传输,以提高整个传输接口的总位宽。可以理解的是,在有多个Lane传输的情况下,对应每个lane的数据处理流程均可以参考上文所述的方法流程。
在本申请实施例中,发送端对N组数据比特流进行单比特的ECC编码,然后对编码后的N组ECC编码比特流进行交织,确保交织后的比特流中一个单元中的N个比特分别对应不同的ECC,并以单元为粒度进行PAM编码,这样,当某个单元中发生误码时,错误会分散到不同的ECC上,进而接收端可以通过单比特的纠错方式完成多比特的纠错处理。
例如,参见图19,为一种纠错示意图。假设由于受外部干扰的影响IO_0的一个UI发生了错误,其他的UI和IO都是正常,以发送“010”为例,接收端PAM-3解码后为“101”,但是这三个比特分别对应ECC,每个ECC可以纠正1个比特的错误,所以经过ECC解码之后,这三个比特又被纠正为“010”,从而通过单比特的纠错方式实现3个比特的纠错能力。
由此可见,本申请实施例可以实现在芯粒互联系统中使用单比特纠错技术进行纠错,相比多比特纠错技术或者重传技术,可以实现更低的延时和更小的面积功耗。
基于相同的技术构思,本申请实施例还提供一种计算机可读存储介质,可读存储介质用于存储指令,当指令被执行时,使如图6A或图6B所示的方法被实现。
基于相同的技术构思,本申请实施例还提供一种计算机程序产品,计算机程序产品中存储有指令,当其在计算机上运行时,使得如图6A或图6B所示的方法被执行。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (27)

  1. 一种数据传输装置,其特征在于,包括:
    错误校正码ECC编码器,用于对N组数据比特流进行单比特纠错的ECC编码,得到N组ECC编码比特流;其中,所述N组数据比特流中每组数据比特流包括M个数据比特,所述N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特;
    交织器,用于通过轮询的方式从所述N组ECC编码比特流中读取编码比特,重组为第一编码比特流;其中,所述第一编码比特流中每N个编码比特为一个单元,每个单元中的N个编码比特分别来自不同的ECC编码比特流;
    输入输出IO数据分发器,用于将所述第一编码比特流中的各个单元分发到Q个IO管脚,得到Q组IO数据流;
    脉冲幅度调制PAM编码器,用于对所述Q组IO数据流进行PAM-N编码,得到Q组PAM编码比特流;
    所述Q个IO管脚,分别用于发送所述Q组PAM编码比特流;
    其中,N、M、P、Q为正整数。
  2. 如权利要求1所述的装置,其特征在于,所述装置还包括:
    数据分组器,用于对原始数据比特流进行分组,得到所述N组数据比特流;将所述N组数据比特流分别传输给所述ECC编码器。
  3. 如权利要求1或2所述的装置,其特征在于,所述Q组IO数据流中各个IO数据流的数据长度相同。
  4. 如权利要求1-3任一项所述的装置,其特征在于,所述单比特纠错的ECC编码包括用于实现单纠错和双检错SEC-DED的扩展汉明码。
  5. 如权利要求4所述的装置,其特征在于,所述用于实现SEC-DED的扩展汉明码包括最优最小奇数权列码。
  6. 如权利要求1-5任一项所述的装置,其特征在于,所述N的取值为3、4、6、8或16中的任一个。
  7. 一种数据传输装置,其特征在于,包括:
    Q个IO管脚,分别用于接收Q组PAM编码比特流;
    PAM解码器,用于对所述Q组PAM编码比特流进行PAM-N解码,得到Q组IO数据流;
    第一汇聚器,用于对所述Q组IO数据流进行汇聚,得到第一编码比特流,所述第一编码比特流中每N个编码比特为一个单元;
    解交织器,用于从所述第一编码比特流的各个单元中读取编码比特,重组为N组ECC编码比特流;其中,每个单元中的N个编码比特分别被分配到不同的ECC编码比特流,所述N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特;
    ECC解码器,用于对所述N组ECC编码比特流进行单比特纠错的ECC解码,得到N组数据比特流;
    其中,N、M、P、Q为正整数。
  8. 如权利要求7所述的装置,其特征在于,所述装置还包括:
    第二汇聚器,用于将所述N组数据比特流汇聚为原始数据比特流。
  9. 如权利要求7或8所述的装置,其特征在于,所述Q组IO数据流中各个IO数据流的数据长度相同。
  10. 如权利要求7-9任一项所述的装置,其特征在于,所述单比特纠错的ECC编码包括用于实现SEC-DED的扩展汉明码。
  11. 如权利要求10所述的装置,其特征在于,所述用于实现SEC-DED的扩展汉明码包括最优最小奇数权列码。
  12. 如权利要求7-11任一项所述的装置,其特征在于,所述N的取值为3、4、6、8或16中的任一个。
  13. 一种数据传输方法,其特征在于,包括:
    对N组数据比特流进行单比特纠错的ECC编码,得到N组ECC编码比特流;其中,所述N组数据比特流中每组数据比特流包括M个数据比特,所述N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特;
    通过轮询的方式从所述N组ECC编码比特流中读取编码比特,重组为第一编码比特流;其中,所述第一编码比特流中每N个编码比特为一个单元,每个单元中的N个编码比特分别来自不同的ECC编码比特流;
    将所述第一编码比特流中的各个单元分发到Q个IO管脚,得到Q组IO数据流;
    对所述Q组IO数据流进行PAM-N编码,得到Q组PAM编码比特流;
    通过所述Q个IO管脚发送所述Q组PAM编码比特流;
    其中,N、M、P、Q为正整数。
  14. 如权利要求13所述的方法,其特征在于,所述方法还包括:
    对原始数据比特流进行分组,得到所述N组数据比特流。
  15. 如权利要求13或14所述的方法,其特征在于,所述Q组IO数据流中各个IO数据流的数据长度相同。
  16. 如权利要求13-15任一项所述的方法,其特征在于,所述单比特纠错的ECC编码包括用于实现SEC-DED的扩展汉明码。
  17. 如权利要求14所述的方法,其特征在于,所述用于实现SEC-DED的扩展汉明码包括最优最小奇数权列码。
  18. 如权利要求13-17任一项所述的方法,其特征在于,所述N的取值为3、4、6、8或16中的任一个。
  19. 一种数据传输方法,其特征在于,包括:
    通过Q个IO管脚接收Q组PAM编码比特流;
    对所述Q组PAM编码比特流进行PAM-N解码,得到Q组IO数据流;
    对所述Q组IO数据流进行汇聚,得到第一编码比特流,所述第一编码比特流中每N个编码比特为一个单元;
    从所述第一编码比特流的各个单元中读取编码比特,重组为N组ECC编码比特流;其中,每个单元中的N个编码比特分别被分配到不同的ECC编码比特流,所述N组ECC编码比特流中每组ECC编码比特流中的编码比特包括M个数据比特和P个校验比特;
    对所述N组ECC编码比特流进行单比特纠错的ECC解码,得到N组数据比特流;
    其中,N、M、P、Q为正整数。
  20. 如权利要求19所述的方法,其特征在于,所述方法还包括:
    将所述N组数据比特流汇聚为原始数据比特流。
  21. 如权利要求19或20所述的方法,其特征在于,所述Q组IO数据流中各个IO数据流的数据长度相同。
  22. 如权利要求19-21任一项所述的方法,其特征在于,所述单比特纠错的ECC编码包括用于实现SEC-DED的扩展汉明码。
  23. 如权利要求22所述的方法,其特征在于,所述用于实现SEC-DED的扩展汉明码包括最优最小奇数权列码。
  24. 如权利要求19-23任一项所述的方法,其特征在于,所述N的取值为3、4、6、8或16中的任一个。
  25. 一种计算机可读存储介质,其特征在于,所述可读存储介质用于存储指令,当所述指令被执行时,使如权利要求13-18任一项所述的方法被实现,或者使如权利要求19-24任一项所述的方法被实现。
  26. 一种计算机程序产品,其特征在于,所述计算机程序产品中存储有指令,当其在计算机上运行时,使得如权利要求13-18任一项所述的方法被执行,或者使得如权利要求19-24任一项所述的方法被执行。
  27. 一种数据传输系统,其特征在于,包括如权利要求1-6任一项所述的装置和如权利要求7-12任一项所述的装置。
PCT/CN2022/128727 2022-10-31 2022-10-31 一种数据传输方法、装置和系统 WO2024092437A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/128727 WO2024092437A1 (zh) 2022-10-31 2022-10-31 一种数据传输方法、装置和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/128727 WO2024092437A1 (zh) 2022-10-31 2022-10-31 一种数据传输方法、装置和系统

Publications (1)

Publication Number Publication Date
WO2024092437A1 true WO2024092437A1 (zh) 2024-05-10

Family

ID=90929205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128727 WO2024092437A1 (zh) 2022-10-31 2022-10-31 一种数据传输方法、装置和系统

Country Status (1)

Country Link
WO (1) WO2024092437A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200278908A1 (en) * 2019-03-01 2020-09-03 Micron Technology, Inc. Extended error detection for a memory device
CN112506730A (zh) * 2020-11-10 2021-03-16 中国人民解放军战略支援部队信息工程大学 适用于网络交换芯片ecc功能验证的验证平台及验证方法
CN112860610A (zh) * 2019-11-27 2021-05-28 英特尔公司 通用物理层上的多协议支持
CN113454602A (zh) * 2019-02-19 2021-09-28 美光科技公司 具有可配置内部错误校正模式的存储器装置
CN113495815A (zh) * 2020-04-07 2021-10-12 英特尔公司 基于计算机总线的错误记录表征错误相关性

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113454602A (zh) * 2019-02-19 2021-09-28 美光科技公司 具有可配置内部错误校正模式的存储器装置
US20200278908A1 (en) * 2019-03-01 2020-09-03 Micron Technology, Inc. Extended error detection for a memory device
CN112860610A (zh) * 2019-11-27 2021-05-28 英特尔公司 通用物理层上的多协议支持
CN113495815A (zh) * 2020-04-07 2021-10-12 英特尔公司 基于计算机总线的错误记录表征错误相关性
CN112506730A (zh) * 2020-11-10 2021-03-16 中国人民解放军战略支援部队信息工程大学 适用于网络交换芯片ecc功能验证的验证平台及验证方法

Similar Documents

Publication Publication Date Title
WO2016045391A1 (zh) 一种数据传输方法及装置
US9660763B2 (en) Methods and apparatus employing FEC codes with permanent inactivation of symbols for encoding and decoding processes
CN102164026B (zh) 基于深空通信环境下的喷泉码编译方法
WO2016107160A1 (zh) Ofdma系统中数据包处理方法及装置、存储介质
CN101217352B (zh) 一阶段速率匹配的缓冲设置方法
WO2015184914A1 (zh) 一种数据包处理方法及装置
JP5679059B2 (ja) 無線送受信装置、通信システム及びそれらに用いるチャネルコーディング処理方法
CN102047565A (zh) 消失校正编码装置和消失校正编码方法
CN101159513A (zh) 一种Turbo码速率匹配及码字比特读取的方法
CN116032422A (zh) 一种数据传输方法和装置
CN102891737B (zh) 一种二进制无速率码的译码方法及编码和译码系统
CN104283637A (zh) 发送设备及其编码方法与接收设备及其解码方法
CN101442383A (zh) 一种高阶调制中的比特优先映射方法
CN101860412B (zh) 子包处理方法、编码调制方法、处理器、调制编码系统
CN107947902A (zh) 一种高速接口芯片的数据差错处理系统及方法
CN102148665A (zh) 一种lt码的译码方法
WO2024092437A1 (zh) 一种数据传输方法、装置和系统
WO2016179743A1 (zh) 一种编码装置及方法
CN103138881B (zh) 编解码方法和设备
WO2021197104A1 (zh) 用于数据通信的编码方法及装置
CN103227693B (zh) 增压码
CN102208963B (zh) 一种系统二进制确定无速率码的译码方法
WO2019047741A1 (zh) 比特交织、解交织方法及装置
CN102035617B (zh) 一种通信系统中信道编码的速率匹配方法和装置
WO2023015863A1 (zh) 数据传输的方法、装置、设备、系统及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22963757

Country of ref document: EP

Kind code of ref document: A1