WO2012149775A1 - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
WO2012149775A1
WO2012149775A1 PCT/CN2011/080280 CN2011080280W WO2012149775A1 WO 2012149775 A1 WO2012149775 A1 WO 2012149775A1 CN 2011080280 W CN2011080280 W CN 2011080280W WO 2012149775 A1 WO2012149775 A1 WO 2012149775A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
type
units
unit
packet
Prior art date
Application number
PCT/CN2011/080280
Other languages
French (fr)
Chinese (zh)
Inventor
王工艺
陈昊
郑伟
常胜
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201180001883.2A priority Critical patent/CN102388385B/en
Priority to PCT/CN2011/080280 priority patent/WO2012149775A1/en
Publication of WO2012149775A1 publication Critical patent/WO2012149775A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L25/00Baseband systems
    • H04L25/02Details ; arrangements for supplying electrical power along data transmission lines
    • H04L25/14Channel dividing arrangements, i.e. in which a single bit stream is divided between several baseband channels and reassembled at the receiver

Definitions

  • Embodiments of the present invention relate to the field of domain data processing and, more particularly, to methods and apparatus for data processing. Background technique
  • ASIC APPLICATION SPECIFIC INTEGRATED CIRCUITS, ASIC
  • FPGA Field PROGRAMMABLE GATE ARRAY
  • a variety of data packets are defined in the QPI (Quick Path Interconnect) protocol. Some of the data packets are of a certain length, and some of the data packets are variable in length.
  • the NCS (Non Coherent Standard) package is a variable length packet, usually composed of 1 to 3 data units (ie, flit), where each data unit flit is fixed length, for example, 80 bits (bit). The length of the variable length NCS packet is available in the first flit.
  • NCS1 denotes an NCS packet of length 1 flit, that is, only one header flit( header flit );
  • NCS2 denotes an NCS packet of length 2 flit, that is, contains one header flit and one data flit (data flit)
  • NCS3 represents an NCS packet with a length of 3 flits, that is, a header flit and two data flit.
  • data packets such as empty data packets. These other types of packets can be fixed length or variable length.
  • Embodiments of the present invention provide a data processing method and apparatus capable of performing alignment processing on a plurality of variable length packets based on a segmentation parallel mechanism.
  • a data processing method including: dividing, in order, input data including N data units corresponding to a current clock cycle into M data segments, where M and N are positive integers, and N is greater than or equal to 2 and M is smaller than N; performing an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to other types of data units, among other types of data
  • the units are all set to the null packet type, where the first type is the type of the packet to be processed, and the other type is the type of the packet that does not need to be processed; the M segments of the aligned data are combined into a data unit containing N data units.
  • an apparatus for data processing including: a segmentation unit, configured to sequentially divide input data including N data units corresponding to a current clock cycle into M data segments, where M and N are both a positive integer, N is greater than or equal to 2 and M is less than N; a parallel processing unit for performing an alignment operation on the first type of data unit in each of the M data segments in parallel, shifting the first type of data unit Before other types of data units, other types of data units are set to the empty packet type, the first type is the type of data packet to be processed, and the other type is the type of data packet that does not need to be processed;
  • the aligned M data segments are combined into output data containing N data units.
  • the data processing method and apparatus of the embodiments of the present invention segment the variable length packets and perform alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving code coverage during design code verification, and significantly improving timing. . DRAWINGS
  • FIG. 1 is a flow chart of a method of data processing in accordance with an embodiment of the present invention.
  • FIG. 2 is a structural diagram of an apparatus for data processing according to an embodiment of the present invention.
  • 3 is a block diagram of a parallel processing unit in accordance with an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a specific process of data processing according to an embodiment of the present invention. detailed description
  • the embodiment of the present invention proposes to divide the input data in each clock cycle into multiple data segments, wherein the lengths of the data segments may be equal or different.
  • one data segment includes 2 data units
  • another data segment includes 3 data units, and so on.
  • individual data units may belong to different types, but other types of data units do not exist between data units of the same type.
  • the alignment operation is performed in parallel on a plurality of data units of the same type for each data segment in each clock cycle. After the aligned data units are obtained, the data segments obtained by the segments are combined step by step, and finally the output data of each clock cycle in which the same type of data units are aligned is obtained.
  • the alignment operation in the embodiment of the present invention is only for the same type of data unit, different types of data units can perform alignment operations in different alignment processing devices, respectively. That is, only one data unit of the same type is aligned in one alignment processing device, and the data unit that the alignment processing device cannot process will be regarded as an empty data packet. It can be understood that the alignment operations for different types of data units can also be parallel in the same clock cycle.
  • the input data including N data units corresponding to the current clock cycle is sequentially divided into M data segments, where M and N are positive integers, N is greater than or equal to 2, and M is less than N.
  • N is an even number
  • N is an odd number
  • the data packet is located at the end of the N data after the alignment operation of the N data units, so the alignment result of the N data units is not affected.
  • 3 data units and 2 data can be used. Units are combined to solve the problem of odd data unit segmentation.
  • the data units in the current data segment are aligned in parallel based on the X unprocessed quantity identifiers, that is, the other types of data units in the current data segment are also set to the empty data packet type, and The first type of data unit is shifted to the data unit of the null packet type, thereby obtaining X rearrangement combinations.
  • X corresponds to the maximum number of data units included in the first type of data packet, and the unprocessed number identification is used to indicate the number of unprocessed data units in the first type of data packet.
  • the method of combining the data units in the data segment directly with the X unprocessed quantity identifiers to obtain the X rearrangement combinations can identify the data type of the data units in the data segment. Since the number of data units in a data segment is limited, it is easy to exhaust all combinations of data units and unprocessed data identification.
  • an input and output logical table in which two data units are combined with three unprocessed number identifiers and three data units are combined with three unprocessed number identifiers will be given. Referring to such a logical table, the result of alignment of the data units in the data segment can be directly obtained.
  • the unprocessed quantity identifier is represented by a binary number. For example, taking the NCS packet data type as an example, 2, b 00 indicates that there are no unprocessed NCS packet data units, 2, b 01 indicates that there is still one unprocessed NCS packet data unit, and 2'b 10 indicates that there are still 2 Unprocessed NCS packet data unit. For another example, if the data packet of other data types includes 5 data units, 2, b 100, 2'b 011, 2'b 010. 2, b 001 can be used to indicate the number of unprocessed data units.
  • a rearrangement combination is selected from the X rearrangement combinations according to the selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, wherein the selection identifier is the above X One of the unprocessed quantity identifiers used to represent the number of unprocessed data units in the first type of data packet in the current data segment.
  • the selection flag of the last data segment output will be latched as the selection identifier for the first data segment alignment operation for the next clock cycle of the current clock cycle.
  • the two adjacent data segments are first combined to obtain a new data segment, and then the adjacent two new data segments are combined to finally obtain output data including N data units.
  • the data processing method of the embodiment of the present invention segments the variable length packets and performs alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving the code coverage rate during design code verification, and significantly improving the timing.
  • the apparatus 20 for data processing includes a segmentation unit 21, a parallel processing unit 22, and a combination unit 23.
  • the segmentation unit 21 is configured to sequentially divide the input data including N data units corresponding to the current clock cycle into M data segments, where M and N are positive integers, N is greater than or equal to 2, and M is less than N.
  • the parallel processing unit 22 is configured to perform an alignment operation on the first type of data units in each of the M data segments in parallel, before shifting the first type of data units to other types of data units, where other types
  • the data unit can be set to an empty packet type.
  • the first type is the type of packet to be processed, and the other types are the type of packet that does not need to be processed. Out of the data.
  • Parallel processing can also be implemented in each data segment, and the type of data is identified.
  • the parallel processing unit 22 further includes a parallel processing module 221 and a selection module 222.
  • the parallel processing module 221 is configured to perform an alignment operation on the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively, to obtain X rearrangement combinations, where X corresponds to the data packet included in the first type.
  • the maximum number of data units, the unprocessed quantity identifier is used to indicate the number of unprocessed data units in the first type of data packet.
  • the parallel processing module 221 completes the identification of the data type by the comparison with the logical table, thereby setting other types of data units in the current data segment to the empty packet type.
  • the selecting module 222 is configured to select one rearrangement combination from the X rearrangement combinations according to the selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, wherein the selection identifier is the X unprocessed quantity identifiers. One for indicating the number of unprocessed data units in the first type of data packet in the current data segment.
  • the parallel processing unit 22 may further include a latch module 223 for latching the last one The selection identifier of the data segment output is used as the selection identifier of the first data segment alignment operation of the next clock cycle of the current clock cycle.
  • the combining unit 23 of the data processing device 20 can sequentially and sequentially align the adjacent two adjacent operations.
  • the data segments are combined into output data comprising N data units, prior to shifting the first type of data units of the N data units to other types of data units.
  • the apparatus for data processing of the embodiment of the present invention segments the variable length packet and performs alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving code coverage during design code verification, and significantly improving timing. .
  • 8 flits need to be processed and stored in parallel, and these 8 flits can be a combination of any type of protocol packets, resulting in multiple possibilities for the same type of packets contained in 8 flits.
  • NCS package Take the NCS package as an example: there may be 0 to 8 NCS1s in 8 flits, or 0 to 4 NCS2s, or 0 to 3 NCS3s, or a combination of NCS1 or NCS2 or NCS3, or NCS packages and other packages.
  • the combination, and their position in the 8 flit is uncertain; and NCS2 or NCS3 may span multiple 8flit groups.
  • Table 1 below gives schematic input data for 8 flits corresponding to clocks N to N+3.
  • X means that there is no need to care about “bubbles", that is, empty packets, and "others” means packets of other data types except NCS.
  • ⁇ 2" , “_3” indicate the first flit belonging to the NCS, the second flit, and the third flit.
  • the eight flits processed at the same time contain three Flits of the NCS type, that is, one NCS 1 and two NCS3s. Among them, one NCS3 spans 8 flits, only one flit is in the 8 flits currently processed, and the other 2 flit are not in the 8 flits currently processed.
  • the clock N + 1 There is only one NCS type flit in the 8 flits processed at the same time. It is necessary to know that this data is the first data in NCS3 received in the last clock cycle. And so on.
  • the output data obtained by the method based on the segmentation parallel data processing according to the embodiment of the present invention as shown in Table 1 is as shown in Table 2 below.
  • NCS type data since only one type of data is processed, for example, only NCS type data is used in the example, other types of packets outside the NCS are identified as empty packets.
  • the data processing process based on segmentation parallelism is the process of "squeezing out" empty packets in segments and in parallel.
  • the input data with 8 flits is divided into 2 data segments for each 2 flit, and 4 data segments are produced in total. Then, the four data segments are aligned in parallel to obtain four sets of aligned data segments. Then, the four sets of data segments are combined into two units in parallel, and combined into a new data segment having four flit aligned. Finally, the two new data segments are combined into output data with 8 flit aligned.
  • two flit in each data segment combined with the unprocessed number identifier are also aligned in parallel, and three rearrangements are obtained.
  • the flit in the same NCS packet after segmentation will likely be split into different data segments, so using the "unprocessed number identifier" to indicate that there may be several flits in the NCS packet involved in the processing. Not processed.
  • the binary numbers 00, 01, and 10 are used to indicate that the number of unprocessed flits existing in the next data segment is 0, respectively. 1 or 2 pieces.
  • Table 3 below shows all 22 combinations of input and output for the parallel processing module to align 2 flit.
  • the combination of 8 flits is transformed into the output of 4 data segments through the segmentation operation, where the flit is aligned.
  • the two or two data segments are operated to achieve a reduction in complexity.
  • the alignment operations of the data segments of various lengths are all parallel, the timing is greatly improved.
  • FIG. 4 is a schematic diagram of a process of data processing according to an embodiment of the present invention.
  • PPM Parallel Process Module
  • PPM0 operates on flitO/1
  • PPM1 operates on flit2/3
  • PPM2 operates on flit4/5
  • PPM3 operates on flit6/7.
  • PPG Parallel Process Group
  • the unprocessed quantity of each parallel processing group identifies the contx differently, thus resulting in three different reordering combinations. Then, according to the selection identifier contxl outputted by the previous data segment, the remake combination result outputted by the current data segment is selected.
  • the unprocessed quantity identifier contx ranges from 0 to 2, indicating the number of unfinished remaining flits from external input data.
  • the selection identifier contx_q of the last data segment PPM3 is latched as the selection flag contxl of PPM0 for the flitO/1 operation of the next clock cycle. Since the parallel processing of PPG enables identification of data types, packets of other data types other than the NCS packet type are set to null packets. It can be seen that each alignment operation is a process of squeezing out unwanted "bubbles (empty packets)".
  • flitO is the head of the NCS1 flit (H)
  • flitl is the null packet (X)
  • reference table 3 corresponds to 2 b00, 2, b01 and 2
  • the output is:
  • the next selection identifier contxl 2, b00, the number of flits in the aligned data segment is 1 and the result of flit alignment.
  • the output is:
  • the next selection identifier contx 2, bl0, the number of flits in the aligned data segment is 1 and the result of flit alignment.
  • the next selection identifier contx 2, bl0, the number of flits in the aligned data segment is 2 and flit The result of the alignment.
  • the operation of 8 flits when the clock is (N + 1) is the same as that of the above 4 .
  • the two adjacent data segments are respectively combined by the first stage to obtain two new data segments, and each of the data segments has four flits, and the four flits.
  • the "bubble" between the fluts of the NCS type is gone.
  • the final output data including 8 flit is obtained, and all the "bubbles" between the fluts of the NCS type in the 8 flit are gone. Also, it can be seen that the number of flits of the NCS type in the eight flits of the current clock is five. This provides great convenience for subsequent storage operations.
  • the data processing method and apparatus of the embodiments of the present invention segment the variable length packets and perform alignment operations on the segments in parallel, thereby facilitating maintenance of the design code and improving the code for design code verification. Coverage, while significantly improving timing.
  • a program product which can enable a processor running the program product to implement the following functions: First, the input data including N data units corresponding to the current clock cycle is sequentially divided into M pieces. a data segment, wherein M and N are both positive integers, N is greater than or equal to 2, and M is less than N; then, the first type of data unit in each of the M data segments is aligned in parallel to make the first type Before the data unit is shifted to other types of data units, other types of data units are set to the empty packet type, where the first type is the type of the packet to be processed, and the other types are the types of packets that need not be processed; Finally, the aligned M data segments are combined into output data containing N data units.
  • the output data aligned in each clock cycle thus obtained is sequentially stored in the buffer, since there is no empty packet data unit between the data units to be processed of the output data or His type of data unit can effectively save storage space.
  • an information storage medium for storing the above program product is provided.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the components displayed for the unit may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions such as those implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the storage medium includes: a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes.

Abstract

Provided are a data processing method and device. The data processing method includes: dividing the input data containing N data units corresponding to the current clock period into M data segments in order, wherein M and N are both positive integers, with N being greater than or equal to 2 and M being less than N; performing an alignment operation on data units of a first type in each of the M data segments in parallel, and enabling the data units of the first type to shift to the front of data units of another type, wherein the data units of another type are all set as an empty data packet type, and the first type is a data packet type to be processed, and the other type is a type not to be processed; and combining the M data segments after alignment processing into output data containing N data units. The data processing device includes a segmentation unit, a parallel processing unit and a combination unit. The data processing method and device in the embodiments of the present invention can segment variable length packets and perform an alignment operation on each of the segments in parallel, thus making it easy to maintain the design code, increasing the code coverage degree during design code verification, and at the same time improving the time sequence markedly.

Description

数据处理的方法和装置 技术领域  Data processing method and device
本发明实施例涉及领域数据处理领域, 并且更具体地, 涉及数据处理的 方法和装置。 背景技术  Embodiments of the present invention relate to the field of domain data processing and, more particularly, to methods and apparatus for data processing. Background technique
在 ASIC ( APPLICATION SPECIFIC INTEGRATED CIRCUITS, 专用集 成电路)或者 FPGA ( FIELD PROGRAMMABLE GATE ARRAY, 现场可编 程门阵列)设计中, 经常需要处理各种的协议规定的数据包, 这些数据包可 能属于不同的类型,即便属于同一类型,这些数据包也可能具有可变的长度。  In the design of ASIC ( APPLICATION SPECIFIC INTEGRATED CIRCUITS, ASIC) or FPGA (Field PROGRAMMABLE GATE ARRAY), it is often necessary to process various protocol-defined data packets, which may belong to different types. These packets may have variable lengths, even if they belong to the same type.
QPI ( Quick Path Interconnect, 快速互联通道)协议中定义了多种数据 包, 其中有的数据包的长度是一定的,也有的数据包的长度是可变的。例如, NCS ( Non Coherent Standard, 非一致性标准) 包就是一种可变长包, 通常 由 1至 3个数据单元(即 flit )组成, 其中每个数据单元 flit是定长的, 例如 可以为 80比特(bit )。 可变长的 NCS包的长度在第一个 flit中可以得到。 为 了便于描述, "NCS1"表示长度为 1个 flit的 NCS包,即只有一个头 flit( header flit ); NCS2表示长度为 2个 flit的 NCS包,即包含一个头 flit和一个数据 flit ( data flit ); NCS3表示长度为 3个 flit的 NCS包, 即包含一个 header flit和 两个 data flit。 除了 NCS包之外, 还存在其他类型的数据包, 例如空数据包。 这些其他类型的数据包可以是定长或变长的。  A variety of data packets are defined in the QPI (Quick Path Interconnect) protocol. Some of the data packets are of a certain length, and some of the data packets are variable in length. For example, the NCS (Non Coherent Standard) package is a variable length packet, usually composed of 1 to 3 data units (ie, flit), where each data unit flit is fixed length, for example, 80 bits (bit). The length of the variable length NCS packet is available in the first flit. For convenience of description, "NCS1" denotes an NCS packet of length 1 flit, that is, only one header flit( header flit ); NCS2 denotes an NCS packet of length 2 flit, that is, contains one header flit and one data flit (data flit) NCS3 represents an NCS packet with a length of 3 flits, that is, a header flit and two data flit. In addition to the NCS package, there are other types of data packets, such as empty data packets. These other types of packets can be fixed length or variable length.
在某些 ASIC或者 FPGA应用设计的场景中, 需要在多个不同的緩冲器 ( BUFFER ) 的一个緩冲器中存储对齐的同类型数据包, 因此需要将同类型 数据包中的各个数据单元进行对齐操作, 其中对齐操作中会将其他类型数据 包视作空数据包。 所谓 "对齐操作" 就是 "挤掉" 上述数据单元之间的空数 据包的过程。 4 设某一时钟周期的输入具有 N个数据单元(其中 N为正整 数), 那么这 N个数据单元对齐操作时需要考虑约 2N种移位可能。 此外, 设 计完成后,若对上述设计的代码进行验证时,假如输入中有 L种数据类型(其 中 L为正整数), 那么验证时的输入存在!^种组合情况。 显然, 如果要想穷 举全部组合情况本身并非易事。 而且, 为了提高代码覆盖率, 验证人员不得 不利用尽量多的数据单元组合对代码进行校验, 势必增加不必要的工作量。 同时, 时序也^ ^满足系统的要求。 发明内容 In some ASIC or FPGA application design scenarios, it is necessary to store aligned packets of the same type in a buffer of multiple different buffers (BUFFER), thus requiring individual data units in the same type of data packet. Alignment, where other types of packets are treated as empty packets in the alignment operation. The so-called "alignment operation" is the process of "squeezing out" empty packets between the above data units. 4 Let the input of a certain clock cycle have N data units (where N is a positive integer), then the N data unit alignment operation needs to consider about 2 N kinds of shift possibilities. In addition, after the design is completed, if the code of the above design is verified, if there are L data types in the input (where L is a positive integer), then the input at the time of verification exists! ^ kind of combination situation. Obviously, it would not be easy to exhaust the entire portfolio. Moreover, in order to improve code coverage, the verifier has to verify the code with as many data unit combinations as possible, which inevitably increases the unnecessary workload. At the same time, the timing is also ^ ^ to meet the requirements of the system. Summary of the invention
发明实施例提供一种数据处理的方法和装置, 能够基于分段并行机制对 多个可变长包进行对齐处理。  Embodiments of the present invention provide a data processing method and apparatus capable of performing alignment processing on a plurality of variable length packets based on a segmentation parallel mechanism.
一方面, 提供了一种数据处理的方法, 包括: 按顺序将对应于当前时钟 周期的包含 N个数据单元的输入数据分为 M个数据段,其中 M和 N均为正 整数, N大于等于 2且 M小于 N; 并行地对 M个数据段的每一个中的第一 类型的数据单元进行对齐操作, 使第一类型的数据单元移位至其他类型的数 据单元之前, 其中其他类型的数据单元均被置为空数据包类型, 其中第一类 型是需处理的数据包类型, 其他类型是不需处理的数据包类型; 将对齐处理 后的 M个数据段组合成包含 N个数据单元的输出数据。  In one aspect, a data processing method is provided, including: dividing, in order, input data including N data units corresponding to a current clock cycle into M data segments, where M and N are positive integers, and N is greater than or equal to 2 and M is smaller than N; performing an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to other types of data units, among other types of data The units are all set to the null packet type, where the first type is the type of the packet to be processed, and the other type is the type of the packet that does not need to be processed; the M segments of the aligned data are combined into a data unit containing N data units. Output Data.
另一方面, 提供了一种数据处理的装置, 包括: 分段单元, 用于按顺序 将对应于当前时钟周期的包含 N个数据单元的输入数据分为 M个数据段, 其中 M和 N均为正整数, N大于等于 2且 M小于 N; 并行处理单元, 用于 并行地对 M 个数据段的每一个中的第一类型的数据单元进行对齐操作, 使 第一类型的数据单元移位至其他类型的数据单元之前, 其中其他类型的数据 单元均被置为空数据包类型, 第一类型是需处理的数据包类型, 其他类型是 不需处理的数据包类型; 组合单元, 用于将对齐处理后的 M 个数据段组合 成包含 N个数据单元的输出数据。  In another aspect, an apparatus for data processing is provided, including: a segmentation unit, configured to sequentially divide input data including N data units corresponding to a current clock cycle into M data segments, where M and N are both a positive integer, N is greater than or equal to 2 and M is less than N; a parallel processing unit for performing an alignment operation on the first type of data unit in each of the M data segments in parallel, shifting the first type of data unit Before other types of data units, other types of data units are set to the empty packet type, the first type is the type of data packet to be processed, and the other type is the type of data packet that does not need to be processed; The aligned M data segments are combined into output data containing N data units.
本发明实施例的数据处理的方法和装置对可变长包进行分段且并行地 对各段进行对齐操作, 从而易于维护设计代码, 而且提高设计代码验证时的 代码覆盖率, 同时明显改善时序。 附图说明  The data processing method and apparatus of the embodiments of the present invention segment the variable length packets and perform alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving code coverage during design code verification, and significantly improving timing. . DRAWINGS
为了更清楚地说明本发明实施例的技术方案, 下面将对实施例或现有技 术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图 仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造 性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description are only some of the present invention. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative work.
图 1是根据本发明实施例的数据处理的方法的流程图。  1 is a flow chart of a method of data processing in accordance with an embodiment of the present invention.
图 2是根据本发明实施例的数据处理的装置的结构图。 图 3是根据本发明实施例的并行处理单元的结构图。 2 is a structural diagram of an apparatus for data processing according to an embodiment of the present invention. 3 is a block diagram of a parallel processing unit in accordance with an embodiment of the present invention.
图 4是根据本发明实施例的数据处理的具体的过程示意图。 具体实施方式  4 is a schematic diagram of a specific process of data processing according to an embodiment of the present invention. detailed description
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分的实施例, 而不 是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出 创造性劳动的前提下所获得的所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
为了解决现有技术中对齐操作情况复杂且时序较低的问题, 本发明实施 例提出将每个时钟周期中的输入数据分为多个数据段, 其中各个数据段的长 度可以相等也可以不等, 例如一个数据段包括 2个数据单元, 另一个数据段 包括 3个数据单元, 等等。 此外, 各个数据单元可以属于不同类型, 但是属 于同种类型的数据单元之间不会存在其他类型的数据单元。 并行地对每个时 钟周期中的各个数据段的同种类型的多个数据单元进行对齐操作。 在得到对 齐的数据单元之后, 再对这些分段得到的数据段进行逐级组合, 最终得到同 种类型数据单元对齐的各个时钟周期的输出数据。 需要说明的是, 由于本发 明实施例中的对齐操作仅针对同一类型的数据单元而言, 因此不同类型的数 据单元可以分别在不同的对齐处理装置中进行对齐操作。 即一个对齐处理装 置中只会对同一类型的数据单元进行对齐处理, 而该对齐处理装置无法处理 的数据单元将被视为空数据包。 可以理解, 同一时钟周期中, 针对不同类型 的数据单元的对齐操作也可以并行。  In order to solve the problem that the alignment operation in the prior art is complicated and the timing is low, the embodiment of the present invention proposes to divide the input data in each clock cycle into multiple data segments, wherein the lengths of the data segments may be equal or different. For example, one data segment includes 2 data units, another data segment includes 3 data units, and so on. In addition, individual data units may belong to different types, but other types of data units do not exist between data units of the same type. The alignment operation is performed in parallel on a plurality of data units of the same type for each data segment in each clock cycle. After the aligned data units are obtained, the data segments obtained by the segments are combined step by step, and finally the output data of each clock cycle in which the same type of data units are aligned is obtained. It should be noted that since the alignment operation in the embodiment of the present invention is only for the same type of data unit, different types of data units can perform alignment operations in different alignment processing devices, respectively. That is, only one data unit of the same type is aligned in one alignment processing device, and the data unit that the alignment processing device cannot process will be regarded as an empty data packet. It can be understood that the alignment operations for different types of data units can also be parallel in the same clock cycle.
以下结合图 1描述根据本发明实施例的数据处理的方法。  A method of data processing according to an embodiment of the present invention is described below with reference to FIG.
11 ,按顺序将对应于当前时钟周期的包含 N个数据单元的输入数据分为 M个数据段, 其中 M和 N均为正整数, N大于等于 2且 M小于 N。  11. The input data including N data units corresponding to the current clock cycle is sequentially divided into M data segments, where M and N are positive integers, N is greater than or equal to 2, and M is less than N.
一般地, 如果 N为偶数, 则容易想到将这 N个数据单元以 2个数据单 元为一个单位分成 M=N/2个数据段。 由于 2个数据单元需要考虑的数据重 排的方案最少, 所以采用 2个数据单元为一个数据段的方式最优。  In general, if N is an even number, it is easy to think that the N data units are divided into M=N/2 data segments in units of 2 data units. Since the two data units need to consider the least amount of data rearrangement, it is optimal to use two data units as one data segment.
如果 N为奇数, 则可以在 N个数据单元的最后补充一个为空数据包的 数据单元, 将这( N+1 )个数据单元分成 Μ=(Ν+1)/2个数据段, 由于空数据 包在 N个数据单元的对齐操作之后都位于 N个数据的最后, 因此不会影响 到这 N个数据单元的对齐结果。或者, 也可以采用 3个数据单元与 2个数据 单元相组合的方式来解决奇数个数据单元分段的问题。 If N is an odd number, then a data unit that is an empty data packet may be added at the end of the N data units, and the (N+1) data units are divided into Μ=(Ν+1)/2 data segments, due to null The data packet is located at the end of the N data after the alignment operation of the N data units, so the alignment result of the N data units is not affected. Alternatively, 3 data units and 2 data can be used. Units are combined to solve the problem of odd data unit segmentation.
当然, 本领域技术人员也应该理解, 对于 N个数据单元而言, 可以采用 各种数据单元的多种数据段组合方式。  Of course, those skilled in the art should also understand that for N data units, multiple data segment combinations of various data units can be employed.
12, 并行地对 M个数据段的每一个中的第一类型的数据单元进行对齐 操作, 使第一类型的数据单元移位至其他类型的数据单元之前, 其中第一类 型是需处理的数据包类型, 其他类型是不需处理的数据包类型。  12, performing an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to other types of data units, where the first type is data to be processed The package type, other types are packet types that do not need to be processed.
对于每个数据段而言,分别基于 X个未处理数量标识并行地对当前数据 段中的数据单元进行对齐操作, 即将当前数据段中其他类型的数据单元也设 置为空数据包类型, 并使第一类型的数据单元移位至空数据包类型的数据单 元之前, 从而得到 X个重排组合。 这里, X对应于第一类型的数据包中包含 的数据单元的最大个数, 该未处理数量标识用于表示第一类型的数据包中未 处理的数据单元个数。  For each data segment, the data units in the current data segment are aligned in parallel based on the X unprocessed quantity identifiers, that is, the other types of data units in the current data segment are also set to the empty data packet type, and The first type of data unit is shifted to the data unit of the null packet type, thereby obtaining X rearrangement combinations. Here, X corresponds to the maximum number of data units included in the first type of data packet, and the unprocessed number identification is used to indicate the number of unprocessed data units in the first type of data packet.
需要说明的是,将数据段中的数据单元直接与 X个未处理数量标识相结 合得到 X个重排组合的方法能够识别该数据段中的数据单元的数据类型。因 为数据段中的数据单元的数量是有限的, 所以容易穷举数据单元与未处理数 据标识的所有组合可能。在以下具体的示例中将给出 2个数据单元与 3个未 处理数量标识相结合以及 3个数据单元与 3个未处理数量标识相结合的输入 输出逻辑表。 参照这样的逻辑表, 可以直接得到该数据段中的数据单元对齐 后的结果。  It should be noted that the method of combining the data units in the data segment directly with the X unprocessed quantity identifiers to obtain the X rearrangement combinations can identify the data type of the data units in the data segment. Since the number of data units in a data segment is limited, it is easy to exhaust all combinations of data units and unprocessed data identification. In the following specific example, an input and output logical table in which two data units are combined with three unprocessed number identifiers and three data units are combined with three unprocessed number identifiers will be given. Referring to such a logical table, the result of alignment of the data units in the data segment can be directly obtained.
另外, 未处理数量标识用二进制数表示。 比如, 以 NCS 包数据类型为 例, 2,b 00表示没有未处理的 NCS包数据单元, 2,b 01表示还有 1个未处理 的 NCS包数据单元, 2'b 10表示还有 2个未处理的 NCS包数据单元。 再例 如, 如果其他数据类型的数据包包括 5个数据单元, 则可以用 2,b 100、 2'b 011、 2'b 010. 2,b 001来表示未处理的数据单元数量。  In addition, the unprocessed quantity identifier is represented by a binary number. For example, taking the NCS packet data type as an example, 2, b 00 indicates that there are no unprocessed NCS packet data units, 2, b 01 indicates that there is still one unprocessed NCS packet data unit, and 2'b 10 indicates that there are still 2 Unprocessed NCS packet data unit. For another example, if the data packet of other data types includes 5 data units, 2, b 100, 2'b 011, 2'b 010. 2, b 001 can be used to indicate the number of unprocessed data units.
在并行得到 X个重排组合后,再依据当前数据段的前一个数据段中数据 单元对齐操作后输出的选择标识从 X个重排组合中选择一个重排组合,其中 选择标识为上述 X个未处理数量标识中的一个,用于表示当前数据段中第一 类型的数据包中未处理的数据单元的个数。  After the X rearrangement combinations are obtained in parallel, a rearrangement combination is selected from the X rearrangement combinations according to the selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, wherein the selection identifier is the above X One of the unprocessed quantity identifiers used to represent the number of unprocessed data units in the first type of data packet in the current data segment.
之后, 最后一个数据段输出的选择标识将被锁存, 该作为当前时钟周期 的下一个时钟周期的第一个数据段对齐操作的选择标识。  Thereafter, the selection flag of the last data segment output will be latched as the selection identifier for the first data segment alignment operation for the next clock cycle of the current clock cycle.
13 ,将对齐处理后的 M个数据段组合成包含 N个数据单元的输出数据。 可选地, 逐级地按顺序将对齐操作后的相邻的两个数据段组合成包含 N 个数据单元的输出数据,使 N个数据单元中第一类型的数据单元移位至其他 类型的数据单元之前。 13. Combine the aligned M data segments into output data including N data units. Optionally, the adjacent two data segments after the alignment operation are combined into the output data including the N data units in order, and the first type of the data units in the N data units are shifted to other types. Before the data unit.
例如, 先将相邻的两个数据段进行组合得到新的数据段, 然后再将相邻 的两个新的数据段进行组合, 最终得到包含 N个数据单元的输出数据。  For example, the two adjacent data segments are first combined to obtain a new data segment, and then the adjacent two new data segments are combined to finally obtain output data including N data units.
本发明实施例的数据处理的方法对可变长包进行分段且并行地对各段 进行对齐操作, 从而易于维护设计代码, 而且提高设计代码验证时的代码覆 盖率, 同时明显改善时序。  The data processing method of the embodiment of the present invention segments the variable length packets and performs alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving the code coverage rate during design code verification, and significantly improving the timing.
参见图 2, 描述根据本发明实施例的数据处理的装置。  Referring to Figure 2, an apparatus for data processing in accordance with an embodiment of the present invention is described.
如图 2所示, 数据处理的装置 20 包括分段单元 21、 并行处理单元 22 和组合单元 23。  As shown in Fig. 2, the apparatus 20 for data processing includes a segmentation unit 21, a parallel processing unit 22, and a combination unit 23.
具体而言, 分段单元 21用于按顺序将对应于当前时钟周期的包含 N个 数据单元的输入数据分为 M个数据段, 其中 M和 N均为正整数, N大于等 于 2且 M小于 N。并行处理单元 22用于并行地对该 M个数据段的每一个中 的第一类型的数据单元进行对齐操作, 使第一类型的数据单元移位至其他类 型的数据单元之前, 其中其他类型的数据单元可以被置为空数据包类型。 这 里, 第一类型是需处理的数据包类型, 其他类型是不需处理的数据包类型。 出数据。  Specifically, the segmentation unit 21 is configured to sequentially divide the input data including N data units corresponding to the current clock cycle into M data segments, where M and N are positive integers, N is greater than or equal to 2, and M is less than N. The parallel processing unit 22 is configured to perform an alignment operation on the first type of data units in each of the M data segments in parallel, before shifting the first type of data units to other types of data units, where other types The data unit can be set to an empty packet type. Here, the first type is the type of packet to be processed, and the other types are the type of packet that does not need to be processed. Out of the data.
同时每个数据段中也可以实现并行处理, 并且识別数据的类型。 例如, 如图 3所示, 并行处理单元 22还包括并行处理模块 221和选择模块 222。其 中,并行处理模块 221用于分别基于 X个未处理数量标识并行地对当前数据 段中的数据单元进行对齐操作, 得到 X个重排组合, 其中 X对应于第一类 型的数据包中包含的数据单元的最大个数, 未处理数量标识用于表示第一类 型的数据包中未处理的数据单元个数。 并行处理模块 221通过与逻辑表的对 照完成数据类型的识别,从而将当前数据段中其他类型的数据单元设置为空 数据包类型。选择模块 222用于依据当前数据段的前一个数据段中数据单元 对齐操作后输出的选择标识从 X个重排组合中选择一个重排组合,其中选择 标识为所述 X个未处理数量标识中的一个,用于表示当前数据段中第一类型 的数据包中未处理的数据单元的个数。  Parallel processing can also be implemented in each data segment, and the type of data is identified. For example, as shown in FIG. 3, the parallel processing unit 22 further includes a parallel processing module 221 and a selection module 222. The parallel processing module 221 is configured to perform an alignment operation on the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively, to obtain X rearrangement combinations, where X corresponds to the data packet included in the first type. The maximum number of data units, the unprocessed quantity identifier is used to indicate the number of unprocessed data units in the first type of data packet. The parallel processing module 221 completes the identification of the data type by the comparison with the logical table, thereby setting other types of data units in the current data segment to the empty packet type. The selecting module 222 is configured to select one rearrangement combination from the X rearrangement combinations according to the selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, wherein the selection identifier is the X unprocessed quantity identifiers. One for indicating the number of unprocessed data units in the first type of data packet in the current data segment.
此外, 并行处理单元 22还可以包括锁存模块 223,其用于锁存最后一个 数据段输出的选择标识,作为当前时钟周期的下一个时钟周期的第一个数据 段对齐操作的选择标识。 In addition, the parallel processing unit 22 may further include a latch module 223 for latching the last one The selection identifier of the data segment output is used as the selection identifier of the first data segment alignment operation of the next clock cycle of the current clock cycle.
最后, 当各个数据段中的数据单元均对齐后, 为了将对齐的各个数据段 组合成输出数据,数据处理的装置 20的组合单元 23可以逐级地按顺序将对 齐操作后的相邻的两个数据段组合成包含 N个数据单元的输出数据, 使 N 个数据单元中第一类型的数据单元移位至其他类型的数据单元之前。  Finally, after the data units in the respective data segments are aligned, in order to combine the aligned data segments into output data, the combining unit 23 of the data processing device 20 can sequentially and sequentially align the adjacent two adjacent operations. The data segments are combined into output data comprising N data units, prior to shifting the first type of data units of the N data units to other types of data units.
因此, 本发明实施例的数据处理的装置对可变长包进行分段且并行地对 各段进行对齐操作, 从而易于维护设计代码, 而且提高设计代码验证时的代 码覆盖率, 同时明显改善时序。  Therefore, the apparatus for data processing of the embodiment of the present invention segments the variable length packet and performs alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving code coverage during design code verification, and significantly improving timing. .
以下以某些 FPGA应用设计的场景中的一个具体实例说明本发明实施 例的数据处理的实现过程。  The implementation of data processing in accordance with an embodiment of the present invention is described below in a specific example of some FPGA application design scenarios.
在 FPGA应用设计时为了满足速率要求,需要连续并行处理并存储 8个 flit, 而这 8个 flit可以是任意类型协议包的组合, 导致 8个 flit中含有的同 一种类型包有多种可能。 以 NCS包为例: 8个 flit中可能有 0至 8个 NCS1 , 或者有 0至 4个 NCS2, 或者有 0至 3个 NCS3, 或者是 NCS1或 NCS2或 NCS3的组合, 或者 NCS包与其他包的组合, 而且它们在 8个 flit中的位置 是不定的; 且 NCS2或者 NCS3可能会跨越多个 8flit组。  In order to meet the rate requirement in FPGA application design, 8 flits need to be processed and stored in parallel, and these 8 flits can be a combination of any type of protocol packets, resulting in multiple possibilities for the same type of packets contained in 8 flits. Take the NCS package as an example: there may be 0 to 8 NCS1s in 8 flits, or 0 to 4 NCS2s, or 0 to 3 NCS3s, or a combination of NCS1 or NCS2 or NCS3, or NCS packages and other packages. The combination, and their position in the 8 flit is uncertain; and NCS2 or NCS3 may span multiple 8flit groups.
下面的表 1给出了对应于时钟 N至 N+3的 8个 flit的示意性输入数据。  Table 1 below gives schematic input data for 8 flits corresponding to clocks N to N+3.
[表 1] 并行 8个 flit的示意性输入数据  [Table 1] Parallel 8 flit schematic input data
Figure imgf000008_0001
Figure imgf000008_0001
其中, "X" 表示无须关心的 "气泡", 即空数据包, "others" 表示除了 NCS外的其他数据类型的数据包。 "― 2" , "_3" 分别表示属于 NCS的 第一个 flit, 第二个 flit, 第三个 flit。  Among them, "X" means that there is no need to care about "bubbles", that is, empty packets, and "others" means packets of other data types except NCS. "― 2" , "_3" indicate the first flit belonging to the NCS, the second flit, and the third flit.
如表 1所示, 在时钟 N时, 同时处理的 8个 flit中含有 3个 NCS类型的 flit, 即一个 NCS 1 , 两个 NCS3。 其中, 有一个 NCS3跨越了 8个 flit, 只有一个 flit 处于当前处理的 8个 flit中,另外 2个 flit不在当前处理的 8个 flit中。在时钟 N + 1 , 同时处理的 8个 flit中只有一个 NCS类型的 flit, 需要知道此数据是上个时钟周 期收到的 NCS3中的第几个数据。 以此类推。 As shown in Table 1, at the time of the clock N, the eight flits processed at the same time contain three Flits of the NCS type, that is, one NCS 1 and two NCS3s. Among them, one NCS3 spans 8 flits, only one flit is in the 8 flits currently processed, and the other 2 flit are not in the 8 flits currently processed. At the clock N + 1 , There is only one NCS type flit in the 8 flits processed at the same time. It is necessary to know that this data is the first data in NCS3 received in the last clock cycle. And so on.
如表 1所示的输入数据经过根据本发明实施例的基于分段并行的数据处 理的方法得到的输出数据如下表 2所示。  The output data obtained by the method based on the segmentation parallel data processing according to the embodiment of the present invention as shown in Table 1 is as shown in Table 2 below.
[表 2] 并行 8个 flit在处理后的示意性输出数据  [Table 2] Parallel 8 flit schematic output data after processing
Figure imgf000009_0001
Figure imgf000009_0001
通过比较可见, 由于仅针对一种类型的数据进行处理, 例如示例中仅针 对 NCS类型数据, 因此 NCS外的其他类型数据包被识别为空数据包。 基于 分段并行的数据处理过程就是分段并行地 "挤掉" 空数据包的过程  By comparison, since only one type of data is processed, for example, only NCS type data is used in the example, other types of packets outside the NCS are identified as empty packets. The data processing process based on segmentation parallelism is the process of "squeezing out" empty packets in segments and in parallel.
采用如本发明实施例的数据处理的方法和装置,假设了最筒化的设计方 式。 首先, 将具有 8个 flit的输入数据分成每 2个 flit为一个数据段的形式, 共生产 4个数据段。 然后, 这 4个数据段并行对齐处理, 得到 4组对齐后的 数据段。 接着, 这 4组数据段又以 2个为一个单位, 并行地组合成具有对齐 的 4个 flit的新的数据段。 最后, 2个新的数据段再组合成具有对齐的 8个 flit的输出数据。  With the method and apparatus for data processing as in the embodiment of the present invention, the most streamlined design is assumed. First, the input data with 8 flits is divided into 2 data segments for each 2 flit, and 4 data segments are produced in total. Then, the four data segments are aligned in parallel to obtain four sets of aligned data segments. Then, the four sets of data segments are combined into two units in parallel, and combined into a new data segment having four flit aligned. Finally, the two new data segments are combined into output data with 8 flit aligned.
具体而言, 在 4个数据段并行对齐的过程中, 其中每个数据段中的 2个 flit再结合未处理数量标识也并行地进行对齐处理, 将得到 3个重排组合。 如上所述,在分段后同一个 NCS数据包中的 flit将可能被分入不同的数据段, 因此用 "未处理数量标识" 来表示处理所涉及的 NCS数据包中可能还有几 个 flit未被处理。在本示例中,由于 NCS类型中最多会有 3个 flit,即 NCS3, 从而用 2进制数 00、 01和 10来表示在下一个数据段中存在的未处理的 flit 的数量分别是 0个、 1个或 2个。  Specifically, in the process of parallel alignment of four data segments, two flit in each data segment combined with the unprocessed number identifier are also aligned in parallel, and three rearrangements are obtained. As mentioned above, the flit in the same NCS packet after segmentation will likely be split into different data segments, so using the "unprocessed number identifier" to indicate that there may be several flits in the NCS packet involved in the processing. Not processed. In this example, since there are at most 3 flits in the NCS type, that is, NCS3, the binary numbers 00, 01, and 10 are used to indicate that the number of unprocessed flits existing in the next data segment is 0, respectively. 1 or 2 pieces.
下面的表 3给出了并行处理模块对 2个 flit进行对齐处理的输入输出的 全部 22种组合。  Table 3 below shows all 22 combinations of input and output for the parallel processing module to align 2 flit.
[表 3] PPM处理的输入输出组合( 2个 flit )  [Table 3] Input and output combination of PPM processing (2 flit)
输入数据 输出数据 输入的 输出的 Input data output data Input output
Flitl FlitO Flit个数 Flitl FlitO Flit number
Contx Contx Contx Contx
X X 2'b00 {X,X} 0  X X 2'b00 {X,X} 0
H (Len=l) X 2'b00 {X,H} 1  H (Len=l) X 2'b00 {X,H} 1
H (Len=2) X 2'b01 {X,H} 1  H (Len=2) X 2'b01 {X,H} 1
H (Len=3) X 2'blO {X,H} 1  H (Len=3) X 2'blO {X,H} 1
X H (Len=l) 2'b00 {X,H} 1  X H (Len=l) 2'b00 {X,H} 1
H (Len=l) H (Len=l) 2'b00 {H,H} 2  H (Len=l) H (Len=l) 2'b00 {H,H} 2
2'b00  2'b00
H (Len=2) H (Len=l) 2'b01 {H,H} 2  H (Len=2) H (Len=l) 2'b01 {H,H} 2
H (Len=3) H (Len=l) 2'blO {H,H} 2  H (Len=3) H (Len=l) 2'blO {H,H} 2
X H (Len=2) 2'b01 {X,H} 1  X H (Len=2) 2'b01 {X,H} 1
D H (Len=2) 2'b00 {D,H} 2  D H (Len=2) 2'b00 {D,H} 2
X H (Len=3) 2'blO {X,H} 1  X H (Len=3) 2'blO {X,H} 1
D H (Len=3) 2'b01 {D,H} 2  D H (Len=3) 2'b01 {D,H} 2
X X 2'b01 {X,X} 0  X X 2'b01 {X,X} 0
D X 2'b00 {X,D} 1  D X 2'b00 {X,D} 1
X D 2'b00 {X,D} 1  X D 2'b00 {X,D} 1
2'b01  2'b01
H (Len=l) D 2'b00 {H,D} 2  H (Len=l) D 2'b00 {H,D} 2
H (Len=2) D 2'b01 {H,D} 2  H (Len=2) D 2'b01 {H,D} 2
H (Len=3) D 2'blO {H,D} 2  H (Len=3) D 2'blO {H,D} 2
X X 2'blO {X,X} 0  X X 2'blO {X,X} 0
D X 2'b01 {X,D} 1  D X 2'b01 {X,D} 1
2'blO  2'blO
X D 2'b01 {X,D} 1  X D 2'b01 {X,D} 1
D D 2'b00 {D,D} 2  D D 2'b00 {D,D} 2
( H: 头 flit; D: 数据 flit; X: 空数据包)  (H: header flit; D: data flit; X: empty packet)
由表 2可知, 在处理 2个 flit的并行处理模块中, 只需设计出对当前处 理的这 2个 flit的各种组合情况。 因为有可能当前处理的 2个 flit是属于上 个数据段处理的 flit中一部分, 并且要明确知道是第几个 flit, 因此需要将每 个数据段产生的剩余 flit个数的信息传递给下一数据段, 最后一个数据段产 生的剩余个数信息作为下一个时钟周期时第一数据段的输入, 这样不会阻塞 后续的 8个 flit的处理。 同时, 使 8个 flit的组合经过分段操作变成 4个数 据段的输出, 其中 flit已对齐。 再两两数据段进行操作, 最终实现复杂度的 降低。 同时, 由于各个长度的数据段的对齐操作都是并行, 因此时序大大改 善。 As can be seen from Table 2, in the parallel processing module that processes two flits, it is only necessary to design various combinations of the two flits currently processed. Because it is possible that the two flits currently processed are part of the flit that belongs to the last data segment processing, and it is necessary to know clearly that it is the first few flits, so it is necessary to pass the information of the remaining flit numbers generated by each data segment to the next. Data segment, the remaining number of information generated by the last data segment is used as the input of the first data segment in the next clock cycle, so that it does not block Subsequent processing of 8 flits. At the same time, the combination of 8 flits is transformed into the output of 4 data segments through the segmentation operation, where the flit is aligned. The two or two data segments are operated to achieve a reduction in complexity. At the same time, since the alignment operations of the data segments of various lengths are all parallel, the timing is greatly improved.
同理, 对于 3个 flit的并行处理模块的对齐处理, 可以参照表 4。 其中 只有有限的 42种组合。  For the same reason, for the alignment processing of the parallel processing modules of the three flits, refer to Table 4. There are only a limited 42 combinations.
[表 4] PPM处理的输入输出组合( 3个 flit )  [Table 4] Input and output combinations of PPM processing (3 flit)
Figure imgf000011_0001
D H (Len=3) H (Len=l) 2'bOl {D, H,H} 3
Figure imgf000011_0001
DH (Len=3) H (Len=l) 2'bOl {D, H,H} 3
X X H (Len=2) 2'b01 {X, X,H} 1X X H (Len=2) 2'b01 {X, X,H} 1
X D H (Len=2) 2'b00 {X, D,H} 2X D H (Len=2) 2'b00 {X, D,H} 2
D X H (Len=2) 2'b00 {X, D,H} 2D X H (Len=2) 2'b00 {X, D,H} 2
H(Len=l) D H (Len=2) 2'b00 {H, D,H} 3H(Len=l) D H (Len=2) 2'b00 {H, D,H} 3
H(Len=2) D H (Len=2) 2'bOl {H, D,H} 3H(Len=2) D H (Len=2) 2'bOl {H, D,H} 3
H(Len=3) D H (Len=2) 2'blO {H, D,H} 3H(Len=3) D H (Len=2) 2'blO {H, D,H} 3
X X H (Len=3) 2'blO {X, X,H} 1X X H (Len=3) 2'blO {X, X,H} 1
X D H (Len=3) 2'bOl {X, D,H} 2X D H (Len=3) 2'bOl {X, D,H} 2
D D H (Len=3) 2'b00 {D, D,H} 3D D H (Len=3) 2'b00 {D, D,H} 3
X X X 2'bOl {X, X,X} 0X X X 2'bOl {X, X, X} 0
D X X 2'b00 {X, X,D} 1D X X 2'b00 {X, X, D} 1
X D X 2'b00 {X, X,D} 1X D X 2'b00 {X, X, D} 1
H(Len=l) D X 2'b00 {X, H,D} 2H(Len=l) D X 2'b00 {X, H,D} 2
H(Len=2) D X 2'bOl {X, H,D} 2H(Len=2) D X 2'bOl {X, H,D} 2
H(Len=3) D X 2'blO {X, H,D} 2H(Len=3) D X 2'blO {X, H,D} 2
X X D 2'b00 {X, X,D} 1 'b01 X H(Len=l) D 2'b00 {X, H,D} 2 X X D 2'b00 {X, X,D} 1 'b01 X H(Len=l) D 2'b00 {X, H,D} 2
H (Len=l) H(Len=l) D 2'b00 {H, H,D} 3 H (Len=l) H(Len=l) D 2'b00 {H, H,D} 3
H (Len=2) H(Len=l) D 2'bOl {H, H,D} 3H (Len=2) H(Len=l) D 2'bOl {H, H,D} 3
H (Len=3) H(Len=l) D 2'blO {H, H,D} 3H (Len=3) H(Len=l) D 2'blO {H, H,D} 3
X H (Len=2) D 2'bOl {X, ¾D} 2X H (Len=2) D 2'bOl {X, 3⁄4D} 2
D H (Len=2) D 2'b00 {D, H,D} 3D H (Len=2) D 2'b00 {D, H, D} 3
X H (Len=3) D 2'blO {X, H,D} 2X H (Len=3) D 2'blO {X, H,D} 2
D H (Len=3) D 2'bOl {D, H,D} 3 'blO X X X 2'blO {X, X,X} 0 D H (Len=3) D 2'bOl {D, H,D} 3 'blO X X X 2'blO {X, X,X} 0
D X X 2'blO {D, X,X} 1 D X X 2'blO {D, X, X} 1
X D X 2'bOl {X, X,D} 1X D X 2'bOl {X, X,D} 1
D D X 2'b00 {X, D,D} 2 X X D 2'b01 {X, X, D } 1DDX 2'b00 {X, D, D} 2 XXD 2'b01 {X, X, D } 1
X D D 2'b00 {X, D, D } 2X D D 2'b00 {X, D, D } 2
D X D 2'b00 {X, D, D } 2D X D 2'b00 {X, D, D } 2
H (Len=l) D D 2'b00 {H, D, D } 3H (Len=l) D D 2'b00 {H, D, D } 3
H (Len=2) D D 2'b01 {H, D, D } 3H (Len=2) D D 2'b01 {H, D, D } 3
H (Len=3) D D 2'blO {H, D, D } 3H (Len=3) D D 2'blO {H, D, D } 3
( H: 头 flit; D: 数据 flit; X: 空数据包) (H: header flit; D: data flit; X: empty packet)
下面将结合图 4描述根据本发明实施例的数据处理的过程。 图 4是根据 本发明实施例的数据处理的过程示意图。  The process of data processing according to an embodiment of the present invention will be described below with reference to FIG. 4 is a schematic diagram of a process of data processing according to an embodiment of the present invention.
在图 4中, 四个并行处理模块 ( PPM, Parallel Process Module ) 的内部 处理方式完全相同, 只是每个并行处理模块的输入不同。 PPM0对 flitO/1操 作, PPM1对 flit2/3操作, PPM2对 flit4/5操作, PPM3对 flit6/7操作。 PPM 内部还有三个小的并行处理组 (PPG, Parallel Process Group ), 每个并行处 理组的未处理数量标识 contx不同, 因此导致 3种不同的重排组合结果。 再 根据前一数据段输出的选择标识 contxl 对当前数据段输出的重拍组合结果 进行选择。 未处理数量标识 contx的范围是 0至 2, 表示来自外部输入数据 的未处理完毕的剩余 flit个数。 此外, 最后一个数据段 PPM3输出的选择标 识 contx— q被锁存, 作为输入下一个时钟周期的对 flitO/1操作的 PPM0的选 择标识 contxl。 由于 PPG的并行处理过程能够实现对数据类型的识别, 因 此 NCS 包类型外的其他数据类型的数据包均被置为空数据包。 可见, 每次 对齐操作都是将不需要的"气泡(空数据包) "挤掉的过程。  In Figure 4, the internal processing of the four parallel processing modules (PPM, Parallel Process Module) is exactly the same, except that the input of each parallel processing module is different. PPM0 operates on flitO/1, PPM1 operates on flit2/3, PPM2 operates on flit4/5, and PPM3 operates on flit6/7. There are also three small parallel processing groups (PPG, Parallel Process Group) inside the PPM. The unprocessed quantity of each parallel processing group identifies the contx differently, thus resulting in three different reordering combinations. Then, according to the selection identifier contxl outputted by the previous data segment, the remake combination result outputted by the current data segment is selected. The unprocessed quantity identifier contx ranges from 0 to 2, indicating the number of unfinished remaining flits from external input data. In addition, the selection identifier contx_q of the last data segment PPM3 is latched as the selection flag contxl of PPM0 for the flitO/1 operation of the next clock cycle. Since the parallel processing of PPG enables identification of data types, packets of other data types other than the NCS packet type are set to null packets. It can be seen that each alignment operation is a process of squeezing out unwanted "bubbles (empty packets)".
以下再结合表 3, 以表 1中时钟为 N时的 8个 flit为例, 详细介绍根据 本发明实施例的数据处理的方法和装置的处理过程。  Referring to Table 3 below, the processing of the data processing method and apparatus according to the embodiment of the present invention will be described in detail by taking 8 flits when the clock is N in Table 1.
假设 contx— q的初值为 2'b00。  Assume that the initial value of contx_q is 2'b00.
对于 PPMO, flitO是 NCS1的头 flit ( H ), flitl是空数据包(X ), 因此参照 表 3对应于 2,b00、 2,b01和 2,bl0在并行处理模块中得到 3个重排组合, 但选择 其选择标识 contxl = contx_q = 2,b00时的结果作为输出。 而通过表 3可知, 该 输出结果为: 下一个选择标识 contxl=2,b00,该对齐的数据段中的 flit个数为 1 以及 flit对齐的结果。  For PPMO, flitO is the head of the NCS1 flit (H), flitl is the null packet (X), so reference table 3 corresponds to 2, b00, 2, b01 and 2, bl0 gets 3 rearrangements in the parallel processing module , but choose the result of its selection identifier contxl = contx_q = 2, b00 as the output. As can be seen from Table 3, the output is: The next selection identifier contxl=2, b00, the number of flits in the aligned data segment is 1 and the result of flit alignment.
对于 PPM 1 , 将 PPM0输出的选择标识 contx 1 = 2' b00作为其结果选择项, 其输出结果为: 下一个选择标识 contx=2,bl0,该对齐的数据段中的 flit个数为 1以及 flit对齐的结果。 For PPM 1, the selection identifier contx 1 = 2' b00 of the PPM0 output is used as the result selection. The output is: The next selection identifier contx=2, bl0, the number of flits in the aligned data segment is 1 and the result of flit alignment.
对于 PPM2 , 将 PPM1的输出的选择标识 contxl = 2'blO作为其结果选择 项, 其输出结果为: 下一个选择标识 contx=2,b01, 该对齐的数据段中的 flit 个数为 1以及 flit对齐的结杲。  For PPM2, the selection identifier of the output of PPM1 is contxl = 2'blO as its result selection, and the output is: the next selection identifier contx=2, b01, the number of flit in the aligned data segment is 1 and flit Aligned knots.
对于 PPM3 , 将 PPM2的输出的选择标识 contxl = 2'b01作为其结果选择 项, 其输出结果为: 下一个选择标识 contx=2,bl0, 该对齐的数据段中的 flit 个数为 2以及 flit对齐的结果。  For PPM3, the selection identifier of the output of PPM2 is contxl = 2'b01 as its result selection, and the output is: The next selection identifier contx=2, bl0, the number of flits in the aligned data segment is 2 and flit The result of the alignment.
PPM3输出的选择标识 contxl=2,bl0被锁存为 contx— q, 作为下一个时 钟周期( N + 1 )时 PPM0的输入。 对时钟为 ( N + 1 )时的 8个 flit的操作与 上面 4 述类 4以。  The selection flag for the PPM3 output is contxl=2, and bl0 is latched as contx_q as the input to PPM0 for the next clock cycle (N + 1). The operation of 8 flits when the clock is (N + 1) is the same as that of the above 4 .
在得到 4个双 flit的对齐操作结果之后, 再将相邻的两个数据段分别经 过第一级的组合得到新的两个数据段, 每个数据段中有 4个 flit, 这 4个 flit 中 NCS类型的 flit之间的 "气泡" 没有了。  After obtaining the result of the alignment operation of the four double flits, the two adjacent data segments are respectively combined by the first stage to obtain two new data segments, and each of the data segments has four flits, and the four flits. The "bubble" between the fluts of the NCS type is gone.
最后, 经过第二级的组合得到最终的包括 8个 flit的输出数据, 这个 8 个 flit中 NCS类型的 flit之间所有的 "气泡" 都没有了。 并且, 可知当前时 钟的 8个 flit中 NCS类型的 flit个数为 5。 这为后续的存储操作提供了极大 的便利。  Finally, after the combination of the second level, the final output data including 8 flit is obtained, and all the "bubbles" between the fluts of the NCS type in the 8 flit are gone. Also, it can be seen that the number of flits of the NCS type in the eight flits of the current clock is five. This provides great convenience for subsequent storage operations.
由上述具体实施例可知, 本发明实施例的数据处理的方法和装置对可变 长包进行分段且并行地对各段进行对齐操作, 从而易于维护设计代码, 而且 提高设计代码验证时的代码覆盖率, 同时明显改善时序。  It can be seen from the above specific embodiments that the data processing method and apparatus of the embodiments of the present invention segment the variable length packets and perform alignment operations on the segments in parallel, thereby facilitating maintenance of the design code and improving the code for design code verification. Coverage, while significantly improving timing.
根据本发明实施例中, 提供了一种程序产品, 可以使运行该程序产品的 处理器实现以下功能: 首先,按顺序将对应于当前时钟周期的包含 N个数据 单元的输入数据分为 M个数据段, 其中 M和 N均为正整数, N大于等于 2 且 M小于 N; 然后, 并行地对 M个数据段的每一个中的第一类型的数据单 元进行对齐操作, 使第一类型的数据单元移位至其他类型的数据单元之前, 其中其他类型的数据单元均被置为空数据包类型, 其中第一类型是需处理的 数据包类型, 其他类型是不需处理的数据包类型; 最后, 将对齐处理后的 M 个数据段组合成包含 N个数据单元的输出数据。  According to an embodiment of the present invention, a program product is provided, which can enable a processor running the program product to implement the following functions: First, the input data including N data units corresponding to the current clock cycle is sequentially divided into M pieces. a data segment, wherein M and N are both positive integers, N is greater than or equal to 2, and M is less than N; then, the first type of data unit in each of the M data segments is aligned in parallel to make the first type Before the data unit is shifted to other types of data units, other types of data units are set to the empty packet type, where the first type is the type of the packet to be processed, and the other types are the types of packets that need not be processed; Finally, the aligned M data segments are combined into output data containing N data units.
由此得到的各个时钟周期中对齐的输出数据顺序地存储在緩冲器中, 由 于输出数据的各个需要处理的数据单元之间不存在空数据包数据单元或其 他类型的数据单元, 因此可以有效地节约存储空间。 The output data aligned in each clock cycle thus obtained is sequentially stored in the buffer, since there is no empty packet data unit between the data units to be processed of the output data or His type of data unit can effectively save storage space.
根据本发明的另一实施例中, 提供了一种信息存储媒体, 用于存储上述 程序产品。  According to another embodiment of the present invention, an information storage medium for storing the above program product is provided.
本领域普通技术人员可以意识到, 结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 定应用和设计约束条件。 专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。  Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in a combination of electronic hardware or computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
所属领域的技术人员可以清楚地了解到, 为描述的方便和简洁, 上述描 述的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应 过程, 在此不再赘述。  A person skilled in the art can clearly understand that the specific working process of the system, the device and the unit described above can be referred to the corresponding process in the foregoing method embodiments for the convenience and brevity of the description, and details are not described herein again.
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。 可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。  In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise. The components displayed for the unit may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元 中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一 个单元中。  In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
所述功能如杲以软件功能单元的形式实现并作为独立的产品销售或使 用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明 的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部 分可以以软件产品的形式体现出来, 该计算机软件产品存储在一个存储介质 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 而前 述的存储介质包括: U盘、移动硬盘、只读存储器( ROM , Read-Only Memory ), 随机存取存储器(RAM, Random Access Memory ), 磁碟或者光盘等各种可 以存储程序代码的介质。 The functions, such as those implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. And before The storage medium includes: a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes.
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。  The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

权利要求 Rights request
1. 一种数据处理的方法, 其特征在于, 包括:  A method of data processing, comprising:
按顺序将对应于当前时钟周期的包含 N个数据单元的输入数据分为 M 个数据段, 其中 M和 N均为正整数, N大于等于 2且 M小于 N;  The input data including N data units corresponding to the current clock cycle is sequentially divided into M data segments, wherein M and N are positive integers, N is greater than or equal to 2, and M is less than N;
并行地对所述 M个数据段的每一个中的第一类型的数据单元进行对齐 操作, 使所述第一类型的数据单元移位至其他类型的数据单元之前, 其中所 述其他类型的数据单元均被置为空数据包类型, 其中所述第一类型是需处理 的数据包类型, 所述其他类型是不需处理的数据包类型;  Performing an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to other types of data units, wherein the other types of data The units are all set to a null packet type, wherein the first type is a packet type to be processed, and the other types are packet types that do not need to be processed;
将对齐处理后的所述 M个数据段组合成包含 N个数据单元的输出数据。  The M data segments after the alignment process are combined into output data including N data units.
2. 如权利要求 1所述的方法, 其特征在于, 所述并行地对所述 M个数 据段的每一个中的所述第一类型的数据单元进行对齐操作包括: 2. The method according to claim 1, wherein the performing the aligning operation on the first type of data unit in each of the M data segments in parallel comprises:
分别基于 X 个未处理数量标识并行地对当前数据段中的数据单元进行 对齐操作, 得到 X个重排组合, 其中 X对应于第一类型的数据包中包含的 数据单元的最大个数, 所述未处理数量标识用于表示所述第一类型的数据包 中未处理的数据单元个数;  Aligning the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively, to obtain X rearrangement combinations, where X corresponds to the maximum number of data units included in the first type of data packet, The unprocessed quantity identifier is used to indicate the number of unprocessed data units in the first type of data packet;
依据所述当前数据段的前一个数据段中数据单元对齐操作后输出的选 择标识从所述 X个重排组合中选择一个重排组合,其中所述选择标识为所述 X个未处理数量标识中的一个, 用于表示当前数据段中所述第一类型的数据 包中未处理的数据单元的个数。  Selecting, according to the selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, one rearrangement combination from the X rearrangement combinations, wherein the selection identifier is the X unprocessed quantity identifiers One of them is used to indicate the number of unprocessed data units in the first type of data packet in the current data segment.
3. 如权利要求 2所述的方法, 其特征在于, 所述并行地对所述 M个数 据段的每一个中的第一类型的数据单元进行对齐操作还包括:  The method according to claim 2, wherein the performing the aligning operation on the data unit of the first type in each of the M data segments in parallel further comprises:
锁存最后一个数据段输出的选择标识,作为当前时钟周期的下一个时钟 周期的第一个数据段对齐操作的选择标识。  The selection identifier of the last data segment output is latched as the selection identifier for the first data segment alignment operation of the next clock cycle of the current clock cycle.
4. 如权利要求 2或 3所述的方法, 其特征在于, 所述分别基于 X个未 处理数量标识并行地对当前数据段中的数据单元进行对齐操作包括:  The method according to claim 2 or 3, wherein the performing the alignment operation on the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively comprises:
将所述当前数据段中所述其他类型的数据单元设置为空数据包类型。 The other types of data units in the current data segment are set to an empty packet type.
5. 如权利要求 1至 4中任一项所述的方法, 其特征在于, 所述将对齐 逐级地按顺序将对齐操作后的相邻的两个数据段組合成包含 N 个数据 单元的输出数据,使 N个数据单元中所述第一类型的数据单元移位至所述其 他类型的数据单元之前。 The method according to any one of claims 1 to 4, wherein the aligning sequentially aligns adjacent two data segments after the alignment operation into N data units. The data is output such that the first type of data unit of the N data units is shifted before the other type of data unit.
6. 一种数据处理的装置, 其特征在于, 包括: A device for data processing, comprising:
分段单元,用于按顺序将对应于当前时钟周期的包含 N个数据单元的输 入数据分为 M个数据段, 其中 M和 N均为正整数, N大于等于 2且 M小 于 N;  a segmentation unit, configured to sequentially divide input data including N data units corresponding to a current clock cycle into M data segments, where M and N are positive integers, N is greater than or equal to 2, and M is less than N;
并行处理单元, 用于并行地对所述 M个数据段的每一个中的第一类型 的数据单元进行对齐操作,使所述第一类型的数据单元移位至其他类型的数 据单元之前, 其中所述其他类型的数据单元均被置为空数据包类型, 其中所 述第一类型是需处理的数据包类型, 所述其他类型是不需处理的数据包类 型;  a parallel processing unit, configured to perform an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to another type of data unit, where The other types of data units are all set to a null packet type, wherein the first type is a type of data packet to be processed, and the other types are data packet types that do not need to be processed;
组合单元, 用于将对齐处理后的所述 M个数据段组合成包含 N个数据 单元的输出数据。  And a combination unit, configured to combine the M data segments after the alignment process into output data including N data units.
7. 如权利要求 6所述的装置, 其特征在于, 所述并行处理单元还包括: 并行处理模块,用于分别基于 X个未处理数量标识并行地对当前数据段 中的数据单元进行对齐操作, 得到 X个重排组合, 其中 X对应于第一类型 的数据包中包含的数据单元的最大个数, 所述未处理数量标识用于表示所述 第一类型的数据包中未处理的数据单元个数;  The apparatus according to claim 6, wherein the parallel processing unit further comprises: a parallel processing module, configured to perform parallel operations on the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively Obtaining X rearrangement combinations, where X corresponds to a maximum number of data units included in a first type of data packet, and the unprocessed quantity identifier is used to represent unprocessed data in the first type of data packet Number of units;
选择模块, 用于依据所述当前数据段的前一个数据段中数据单元对齐操 作后输出的选择标识从所述 X个重排组合中选择一个重排组合,其中所述选 择标识为所述 X个未处理数量标识中的一个,用于表示当前数据段中所述第 一类型的数据包中未处理的数据单元的个数。  a selection module, configured to select one rearrangement combination from the X rearrangement combinations according to a selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, where the selection identifier is the X One of the unprocessed quantity identifiers is used to indicate the number of unprocessed data units in the first type of data packet in the current data segment.
8. 如权利要求 7所述的装置, 其特征在于, 所述并行处理单元还包括: 锁存模块, 用于锁存最后一个数据段输出的选择标识, 作为当前时钟周 期的下一个时钟周期的第一个数据段对齐操作的选择标识。  8. The apparatus according to claim 7, wherein the parallel processing unit further comprises: a latching module, configured to latch a selection identifier of the last data segment output as the next clock cycle of the current clock cycle. The selection identifier for the first data segment alignment operation.
9. 如权利要求 7或 8所述的装置, 其特征在于, 所述并行处理模块进 一步用于将所述当前数据段中所述其他类型的数据单元设置为空数据包类 型。  The apparatus according to claim 7 or 8, wherein the parallel processing module is further configured to set the other type of data unit in the current data segment to a null data packet type.
10. 如权利要求 6至 9中任一项所述的装置, 其特征在于, 所述组合单 元进一步用于逐级地按顺序将对齐操作后的相邻的两个数据段组合成包含 N个数据单元的输出数据,使 N个数据单元中所述第一类型的数据单元移位 至所述其他类型的数据单元之前。  The apparatus according to any one of claims 6 to 9, wherein the combining unit is further configured to combine adjacent two data segments after the alignment operation into N in order. Output data of the data unit, prior to shifting the first type of data unit of the N data units to the other type of data unit.
PCT/CN2011/080280 2011-09-28 2011-09-28 Data processing method and device WO2012149775A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201180001883.2A CN102388385B (en) 2011-09-28 2011-09-28 Data processing method and device
PCT/CN2011/080280 WO2012149775A1 (en) 2011-09-28 2011-09-28 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/080280 WO2012149775A1 (en) 2011-09-28 2011-09-28 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2012149775A1 true WO2012149775A1 (en) 2012-11-08

Family

ID=45826496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/080280 WO2012149775A1 (en) 2011-09-28 2011-09-28 Data processing method and device

Country Status (2)

Country Link
CN (1) CN102388385B (en)
WO (1) WO2012149775A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098440A1 (en) * 2014-10-06 2016-04-07 Microsoft Technology Licensing, Llc Validation of segmented data entries
CN104579565A (en) * 2014-12-31 2015-04-29 曙光信息产业(北京)有限公司 Data processing method and device for transmission system
WO2018107476A1 (en) * 2016-12-16 2018-06-21 华为技术有限公司 Memory access device, computing device and device applied to convolutional neural network computation
CN112131182A (en) * 2020-08-14 2020-12-25 陕西千山航空电子有限责任公司 Rapid alignment processing method for packet mining type flight parameter data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101573919A (en) * 2006-10-31 2009-11-04 罗伯特·博世有限公司 Method for sending a data transfer block and method and system for transferring a data transfer block
CN101853229A (en) * 2010-05-17 2010-10-06 华为终端有限公司 Method and device for data transportation, and method of data reading operation and data writing operation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000079415A2 (en) * 1999-06-18 2000-12-28 Torrent Systems, Inc. Segmentation and processing of continuous data streams using transactional semantics
US7272622B2 (en) * 2001-10-29 2007-09-18 Intel Corporation Method and apparatus for parallel shift right merge of data
US7243172B2 (en) * 2003-10-14 2007-07-10 Broadcom Corporation Fragment storage for data alignment and merger
US7441006B2 (en) * 2003-12-11 2008-10-21 International Business Machines Corporation Reducing number of write operations relative to delivery of out-of-order RDMA send messages by managing reference counter

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101573919A (en) * 2006-10-31 2009-11-04 罗伯特·博世有限公司 Method for sending a data transfer block and method and system for transferring a data transfer block
CN101853229A (en) * 2010-05-17 2010-10-06 华为终端有限公司 Method and device for data transportation, and method of data reading operation and data writing operation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEI SHUJUN ET AL.: "The Design and Implement of FPGA Multi-channel Data Auto-Alignment", NUCLEAR ELECTRONICS & DETECTION TECHNOLOGY, vol. 30, no. 3, 31 March 2010 (2010-03-31), pages 410 - 412 *

Also Published As

Publication number Publication date
CN102388385A (en) 2012-03-21
CN102388385B (en) 2013-08-28

Similar Documents

Publication Publication Date Title
CN110163609B (en) Method and device for processing data in block chain
EP1789869B1 (en) Method and apparatus for performing modular exponentiations
US20080148013A1 (en) RDMA Method for MPI_REDUCE/MPI_ALLREDUCE on Large Vectors
WO2014059024A1 (en) Heterogeneous channel capacities in an interconnect
US8856198B2 (en) Random value production methods and systems
CN107040334A (en) Loss for communication network restores agreement
WO2015100624A1 (en) Crc calculation method and apparatus
CN110659905B (en) Transaction verification method, device, terminal equipment and storage medium
WO2007075106A1 (en) Fast low-density parity-check code encoder
WO2012149775A1 (en) Data processing method and device
CN1993946A (en) Method for storing messages in a message memory and corresponding message memory
EP1862910A2 (en) Bus inverting code generating apparatus and method of generating bus inverting code using the same
US20220383304A1 (en) Distributed network with consensus mechanism
CN103312577B (en) A kind of method and device of processing MAC data
CN102130744A (en) Method and device for computing Cyclic Redundancy Check (CRC) code
Denholm et al. Network-level FPGA acceleration of low latency market data feed arbitration
CN113378194B (en) Encryption and decryption operation acceleration method, system and storage medium
US9948756B1 (en) Automated pipeline generation
CN115346099A (en) Image convolution method, chip, equipment and medium based on accelerator chip
US9509780B2 (en) Information processing system and control method of information processing system
US20230412281A1 (en) Optical connectivity for interconnect technologies
JP6366103B2 (en) Semiconductor device and data output method
CN116244235A (en) Data bus data transmission method, terminal and storage medium
US20220391666A1 (en) Distributed Deep Learning System and Distributed Deep Learning Method
CN117744114A (en) Symmetric encryption and decryption method and related device

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180001883.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11864837

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11864837

Country of ref document: EP

Kind code of ref document: A1