WO2012149775A1

WO2012149775A1 - Data processing method and device

Info

Publication number: WO2012149775A1
Application number: PCT/CN2011/080280
Authority: WO
Inventors: 王工艺; 陈昊; 郑伟; 常胜
Original assignee: 华为技术有限公司
Priority date: 2011-09-28
Filing date: 2011-09-28
Publication date: 2012-11-08
Also published as: CN102388385A; CN102388385B

Abstract

Provided are a data processing method and device. The data processing method includes: dividing the input data containing N data units corresponding to the current clock period into M data segments in order, wherein M and N are both positive integers, with N being greater than or equal to 2 and M being less than N; performing an alignment operation on data units of a first type in each of the M data segments in parallel, and enabling the data units of the first type to shift to the front of data units of another type, wherein the data units of another type are all set as an empty data packet type, and the first type is a data packet type to be processed, and the other type is a type not to be processed; and combining the M data segments after alignment processing into output data containing N data units. The data processing device includes a segmentation unit, a parallel processing unit and a combination unit. The data processing method and device in the embodiments of the present invention can segment variable length packets and perform an alignment operation on each of the segments in parallel, thus making it easy to maintain the design code, increasing the code coverage degree during design code verification, and at the same time improving the time sequence markedly.

Description

Data processing method and device

Embodiments of the present invention relate to the field of domain data processing and, more particularly, to methods and apparatus for data processing. Background technique

In the design of ASIC ( APPLICATION SPECIFIC INTEGRATED CIRCUITS, ASIC) or FPGA (Field PROGRAMMABLE GATE ARRAY), it is often necessary to process various protocol-defined data packets, which may belong to different types. These packets may have variable lengths, even if they belong to the same type.

A variety of data packets are defined in the QPI (Quick Path Interconnect) protocol. Some of the data packets are of a certain length, and some of the data packets are variable in length. For example, the NCS (Non Coherent Standard) package is a variable length packet, usually composed of 1 to 3 data units (ie, flit), where each data unit flit is fixed length, for example, 80 bits (bit). The length of the variable length NCS packet is available in the first flit. For convenience of description, "NCS1" denotes an NCS packet of length 1 flit, that is, only one header flit( header flit ); NCS2 denotes an NCS packet of length 2 flit, that is, contains one header flit and one data flit (data flit) NCS3 represents an NCS packet with a length of 3 flits, that is, a header flit and two data flit. In addition to the NCS package, there are other types of data packets, such as empty data packets. These other types of packets can be fixed length or variable length.

In some ASIC or FPGA application design scenarios, it is necessary to store aligned packets of the same type in a buffer of multiple different buffers (BUFFER), thus requiring individual data units in the same type of data packet. Alignment, where other types of packets are treated as empty packets in the alignment operation. The so-called "alignment operation" is the process of "squeezing out" empty packets between the above data units. 4 Let the input of a certain clock cycle have N data units (where N is a positive integer), then the N data unit alignment operation needs to consider about 2 ^N kinds of shift possibilities. In addition, after the design is completed, if the code of the above design is verified, if there are L data types in the input (where L is a positive integer), then the input at the time of verification exists! ^ kind of combination situation. Obviously, it would not be easy to exhaust the entire portfolio. Moreover, in order to improve code coverage, the verifier has to verify the code with as many data unit combinations as possible, which inevitably increases the unnecessary workload. At the same time, the timing is also ^ ^ to meet the requirements of the system. Summary of the invention

Embodiments of the present invention provide a data processing method and apparatus capable of performing alignment processing on a plurality of variable length packets based on a segmentation parallel mechanism.

In one aspect, a data processing method is provided, including: dividing, in order, input data including N data units corresponding to a current clock cycle into M data segments, where M and N are positive integers, and N is greater than or equal to 2 and M is smaller than N; performing an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to other types of data units, among other types of data The units are all set to the null packet type, where the first type is the type of the packet to be processed, and the other type is the type of the packet that does not need to be processed; the M segments of the aligned data are combined into a data unit containing N data units. Output Data.

In another aspect, an apparatus for data processing is provided, including: a segmentation unit, configured to sequentially divide input data including N data units corresponding to a current clock cycle into M data segments, where M and N are both a positive integer, N is greater than or equal to 2 and M is less than N; a parallel processing unit for performing an alignment operation on the first type of data unit in each of the M data segments in parallel, shifting the first type of data unit Before other types of data units, other types of data units are set to the empty packet type, the first type is the type of data packet to be processed, and the other type is the type of data packet that does not need to be processed; The aligned M data segments are combined into output data containing N data units.

The data processing method and apparatus of the embodiments of the present invention segment the variable length packets and perform alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving code coverage during design code verification, and significantly improving timing. . DRAWINGS

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description are only some of the present invention. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative work.

1 is a flow chart of a method of data processing in accordance with an embodiment of the present invention.

2 is a structural diagram of an apparatus for data processing according to an embodiment of the present invention. 3 is a block diagram of a parallel processing unit in accordance with an embodiment of the present invention.

4 is a schematic diagram of a specific process of data processing according to an embodiment of the present invention. detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

In order to solve the problem that the alignment operation in the prior art is complicated and the timing is low, the embodiment of the present invention proposes to divide the input data in each clock cycle into multiple data segments, wherein the lengths of the data segments may be equal or different. For example, one data segment includes 2 data units, another data segment includes 3 data units, and so on. In addition, individual data units may belong to different types, but other types of data units do not exist between data units of the same type. The alignment operation is performed in parallel on a plurality of data units of the same type for each data segment in each clock cycle. After the aligned data units are obtained, the data segments obtained by the segments are combined step by step, and finally the output data of each clock cycle in which the same type of data units are aligned is obtained. It should be noted that since the alignment operation in the embodiment of the present invention is only for the same type of data unit, different types of data units can perform alignment operations in different alignment processing devices, respectively. That is, only one data unit of the same type is aligned in one alignment processing device, and the data unit that the alignment processing device cannot process will be regarded as an empty data packet. It can be understood that the alignment operations for different types of data units can also be parallel in the same clock cycle.

A method of data processing according to an embodiment of the present invention is described below with reference to FIG.

11. The input data including N data units corresponding to the current clock cycle is sequentially divided into M data segments, where M and N are positive integers, N is greater than or equal to 2, and M is less than N.

In general, if N is an even number, it is easy to think that the N data units are divided into M=N/2 data segments in units of 2 data units. Since the two data units need to consider the least amount of data rearrangement, it is optimal to use two data units as one data segment.

If N is an odd number, then a data unit that is an empty data packet may be added at the end of the N data units, and the (N+1) data units are divided into Μ=(Ν+1)/2 data segments, due to null The data packet is located at the end of the N data after the alignment operation of the N data units, so the alignment result of the N data units is not affected. Alternatively, 3 data units and 2 data can be used. Units are combined to solve the problem of odd data unit segmentation.

Of course, those skilled in the art should also understand that for N data units, multiple data segment combinations of various data units can be employed.

12, performing an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to other types of data units, where the first type is data to be processed The package type, other types are packet types that do not need to be processed.

For each data segment, the data units in the current data segment are aligned in parallel based on the X unprocessed quantity identifiers, that is, the other types of data units in the current data segment are also set to the empty data packet type, and The first type of data unit is shifted to the data unit of the null packet type, thereby obtaining X rearrangement combinations. Here, X corresponds to the maximum number of data units included in the first type of data packet, and the unprocessed number identification is used to indicate the number of unprocessed data units in the first type of data packet.

It should be noted that the method of combining the data units in the data segment directly with the X unprocessed quantity identifiers to obtain the X rearrangement combinations can identify the data type of the data units in the data segment. Since the number of data units in a data segment is limited, it is easy to exhaust all combinations of data units and unprocessed data identification. In the following specific example, an input and output logical table in which two data units are combined with three unprocessed number identifiers and three data units are combined with three unprocessed number identifiers will be given. Referring to such a logical table, the result of alignment of the data units in the data segment can be directly obtained.

In addition, the unprocessed quantity identifier is represented by a binary number. For example, taking the NCS packet data type as an example, 2, b 00 indicates that there are no unprocessed NCS packet data units, 2, b 01 indicates that there is still one unprocessed NCS packet data unit, and 2'b 10 indicates that there are still 2 Unprocessed NCS packet data unit. For another example, if the data packet of other data types includes 5 data units, 2, b 100, 2'b 011, 2'b 010. 2, b 001 can be used to indicate the number of unprocessed data units.

After the X rearrangement combinations are obtained in parallel, a rearrangement combination is selected from the X rearrangement combinations according to the selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, wherein the selection identifier is the above X One of the unprocessed quantity identifiers used to represent the number of unprocessed data units in the first type of data packet in the current data segment.

Thereafter, the selection flag of the last data segment output will be latched as the selection identifier for the first data segment alignment operation for the next clock cycle of the current clock cycle.

13. Combine the aligned M data segments into output data including N data units. Optionally, the adjacent two data segments after the alignment operation are combined into the output data including the N data units in order, and the first type of the data units in the N data units are shifted to other types. Before the data unit.

For example, the two adjacent data segments are first combined to obtain a new data segment, and then the adjacent two new data segments are combined to finally obtain output data including N data units.

The data processing method of the embodiment of the present invention segments the variable length packets and performs alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving the code coverage rate during design code verification, and significantly improving the timing.

Referring to Figure 2, an apparatus for data processing in accordance with an embodiment of the present invention is described.

As shown in Fig. 2, the apparatus 20 for data processing includes a segmentation unit 21, a parallel processing unit 22, and a combination unit 23.

Specifically, the segmentation unit 21 is configured to sequentially divide the input data including N data units corresponding to the current clock cycle into M data segments, where M and N are positive integers, N is greater than or equal to 2, and M is less than N. The parallel processing unit 22 is configured to perform an alignment operation on the first type of data units in each of the M data segments in parallel, before shifting the first type of data units to other types of data units, where other types The data unit can be set to an empty packet type. Here, the first type is the type of packet to be processed, and the other types are the type of packet that does not need to be processed. Out of the data.

Parallel processing can also be implemented in each data segment, and the type of data is identified. For example, as shown in FIG. 3, the parallel processing unit 22 further includes a parallel processing module 221 and a selection module 222. The parallel processing module 221 is configured to perform an alignment operation on the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively, to obtain X rearrangement combinations, where X corresponds to the data packet included in the first type. The maximum number of data units, the unprocessed quantity identifier is used to indicate the number of unprocessed data units in the first type of data packet. The parallel processing module 221 completes the identification of the data type by the comparison with the logical table, thereby setting other types of data units in the current data segment to the empty packet type. The selecting module 222 is configured to select one rearrangement combination from the X rearrangement combinations according to the selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, wherein the selection identifier is the X unprocessed quantity identifiers. One for indicating the number of unprocessed data units in the first type of data packet in the current data segment.

In addition, the parallel processing unit 22 may further include a latch module 223 for latching the last one The selection identifier of the data segment output is used as the selection identifier of the first data segment alignment operation of the next clock cycle of the current clock cycle.

Finally, after the data units in the respective data segments are aligned, in order to combine the aligned data segments into output data, the combining unit 23 of the data processing device 20 can sequentially and sequentially align the adjacent two adjacent operations. The data segments are combined into output data comprising N data units, prior to shifting the first type of data units of the N data units to other types of data units.

Therefore, the apparatus for data processing of the embodiment of the present invention segments the variable length packet and performs alignment operations on the segments in parallel, thereby facilitating maintenance of the design code, and improving code coverage during design code verification, and significantly improving timing. .

The implementation of data processing in accordance with an embodiment of the present invention is described below in a specific example of some FPGA application design scenarios.

In order to meet the rate requirement in FPGA application design, 8 flits need to be processed and stored in parallel, and these 8 flits can be a combination of any type of protocol packets, resulting in multiple possibilities for the same type of packets contained in 8 flits. Take the NCS package as an example: there may be 0 to 8 NCS1s in 8 flits, or 0 to 4 NCS2s, or 0 to 3 NCS3s, or a combination of NCS1 or NCS2 or NCS3, or NCS packages and other packages. The combination, and their position in the 8 flit is uncertain; and NCS2 or NCS3 may span multiple 8flit groups.

Table 1 below gives schematic input data for 8 flits corresponding to clocks N to N+3.

[Table 1] Parallel 8 flit schematic input data

Among them, "X" means that there is no need to care about "bubbles", that is, empty packets, and "others" means packets of other data types except NCS. "― 2" , "_3" indicate the first flit belonging to the NCS, the second flit, and the third flit.

As shown in Table 1, at the time of the clock N, the eight flits processed at the same time contain three Flits of the NCS type, that is, one NCS 1 and two NCS3s. Among them, one NCS3 spans 8 flits, only one flit is in the 8 flits currently processed, and the other 2 flit are not in the 8 flits currently processed. At the clock N + 1 , There is only one NCS type flit in the 8 flits processed at the same time. It is necessary to know that this data is the first data in NCS3 received in the last clock cycle. And so on.

The output data obtained by the method based on the segmentation parallel data processing according to the embodiment of the present invention as shown in Table 1 is as shown in Table 2 below.

[Table 2] Parallel 8 flit schematic output data after processing

By comparison, since only one type of data is processed, for example, only NCS type data is used in the example, other types of packets outside the NCS are identified as empty packets. The data processing process based on segmentation parallelism is the process of "squeezing out" empty packets in segments and in parallel.

With the method and apparatus for data processing as in the embodiment of the present invention, the most streamlined design is assumed. First, the input data with 8 flits is divided into 2 data segments for each 2 flit, and 4 data segments are produced in total. Then, the four data segments are aligned in parallel to obtain four sets of aligned data segments. Then, the four sets of data segments are combined into two units in parallel, and combined into a new data segment having four flit aligned. Finally, the two new data segments are combined into output data with 8 flit aligned.

Specifically, in the process of parallel alignment of four data segments, two flit in each data segment combined with the unprocessed number identifier are also aligned in parallel, and three rearrangements are obtained. As mentioned above, the flit in the same NCS packet after segmentation will likely be split into different data segments, so using the "unprocessed number identifier" to indicate that there may be several flits in the NCS packet involved in the processing. Not processed. In this example, since there are at most 3 flits in the NCS type, that is, NCS3, the binary numbers 00, 01, and 10 are used to indicate that the number of unprocessed flits existing in the next data segment is 0, respectively. 1 or 2 pieces.

Table 3 below shows all 22 combinations of input and output for the parallel processing module to align 2 flit.

[Table 3] Input and output combination of PPM processing (2 flit)

Input data output data Input output

Flitl FlitO Flit number

Contx Contx

X X 2'b00 {X,X} 0

H (Len=l) X 2'b00 {X,H} 1

H (Len=2) X 2'b01 {X,H} 1

H (Len=3) X 2'blO {X,H} 1

X H (Len=l) 2'b00 {X,H} 1

H (Len=l) H (Len=l) 2'b00 {H,H} 2

2'b00

H (Len=2) H (Len=l) 2'b01 {H,H} 2

H (Len=3) H (Len=l) 2'blO {H,H} 2

X H (Len=2) 2'b01 {X,H} 1

D H (Len=2) 2'b00 {D,H} 2

X H (Len=3) 2'blO {X,H} 1

D H (Len=3) 2'b01 {D,H} 2

X X 2'b01 {X,X} 0

D X 2'b00 {X,D} 1

X D 2'b00 {X,D} 1

2'b01

H (Len=l) D 2'b00 {H,D} 2

H (Len=2) D 2'b01 {H,D} 2

H (Len=3) D 2'blO {H,D} 2

X X 2'blO {X,X} 0

D X 2'b01 {X,D} 1

2'blO

X D 2'b01 {X,D} 1

D D 2'b00 {D,D} 2

(H: header flit; D: data flit; X: empty packet)

As can be seen from Table 2, in the parallel processing module that processes two flits, it is only necessary to design various combinations of the two flits currently processed. Because it is possible that the two flits currently processed are part of the flit that belongs to the last data segment processing, and it is necessary to know clearly that it is the first few flits, so it is necessary to pass the information of the remaining flit numbers generated by each data segment to the next. Data segment, the remaining number of information generated by the last data segment is used as the input of the first data segment in the next clock cycle, so that it does not block Subsequent processing of 8 flits. At the same time, the combination of 8 flits is transformed into the output of 4 data segments through the segmentation operation, where the flit is aligned. The two or two data segments are operated to achieve a reduction in complexity. At the same time, since the alignment operations of the data segments of various lengths are all parallel, the timing is greatly improved.

For the same reason, for the alignment processing of the parallel processing modules of the three flits, refer to Table 4. There are only a limited 42 combinations.

[Table 4] Input and output combinations of PPM processing (3 flit)

DH (Len=3) H (Len=l) 2'bOl {D, H,H} 3

X X H (Len=2) 2'b01 {X, X,H} 1

X D H (Len=2) 2'b00 {X, D,H} 2

D X H (Len=2) 2'b00 {X, D,H} 2

H(Len=l) D H (Len=2) 2'b00 {H, D,H} 3

H(Len=2) D H (Len=2) 2'bOl {H, D,H} 3

H(Len=3) D H (Len=2) 2'blO {H, D,H} 3

X X H (Len=3) 2'blO {X, X,H} 1

X D H (Len=3) 2'bOl {X, D,H} 2

D D H (Len=3) 2'b00 {D, D,H} 3

X X X 2'bOl {X, X, X} 0

D X X 2'b00 {X, X, D} 1

X D X 2'b00 {X, X, D} 1

H(Len=l) D X 2'b00 {X, H,D} 2

H(Len=2) D X 2'bOl {X, H,D} 2

H(Len=3) D X 2'blO {X, H,D} 2

X X D 2'b00 {X, X,D} 1 'b01 X H(Len=l) D 2'b00 {X, H,D} 2

H (Len=l) H(Len=l) D 2'b00 {H, H,D} 3

H (Len=2) H(Len=l) D 2'bOl {H, H,D} 3

H (Len=3) H(Len=l) D 2'blO {H, H,D} 3

X H (Len=2) D 2'bOl {X, 3⁄4D} 2

D H (Len=2) D 2'b00 {D, H, D} 3

X H (Len=3) D 2'blO {X, H,D} 2

D H (Len=3) D 2'bOl {D, H,D} 3 'blO X X X 2'blO {X, X,X} 0

D X X 2'blO {D, X, X} 1

X D X 2'bOl {X, X,D} 1

DDX 2'b00 {X, D, D} 2 XXD 2'b01 {X, X, D } 1

X D D 2'b00 {X, D, D } 2

D X D 2'b00 {X, D, D } 2

H (Len=l) D D 2'b00 {H, D, D } 3

H (Len=2) D D 2'b01 {H, D, D } 3

H (Len=3) D D 2'blO {H, D, D } 3

(H: header flit; D: data flit; X: empty packet)

The process of data processing according to an embodiment of the present invention will be described below with reference to FIG. 4 is a schematic diagram of a process of data processing according to an embodiment of the present invention.

In Figure 4, the internal processing of the four parallel processing modules (PPM, Parallel Process Module) is exactly the same, except that the input of each parallel processing module is different. PPM0 operates on flitO/1, PPM1 operates on flit2/3, PPM2 operates on flit4/5, and PPM3 operates on flit6/7. There are also three small parallel processing groups (PPG, Parallel Process Group) inside the PPM. The unprocessed quantity of each parallel processing group identifies the contx differently, thus resulting in three different reordering combinations. Then, according to the selection identifier contxl outputted by the previous data segment, the remake combination result outputted by the current data segment is selected. The unprocessed quantity identifier contx ranges from 0 to 2, indicating the number of unfinished remaining flits from external input data. In addition, the selection identifier contx_q of the last data segment PPM3 is latched as the selection flag contxl of PPM0 for the flitO/1 operation of the next clock cycle. Since the parallel processing of PPG enables identification of data types, packets of other data types other than the NCS packet type are set to null packets. It can be seen that each alignment operation is a process of squeezing out unwanted "bubbles (empty packets)".

Referring to Table 3 below, the processing of the data processing method and apparatus according to the embodiment of the present invention will be described in detail by taking 8 flits when the clock is N in Table 1.

Assume that the initial value of contx_q is 2'b00.

For PPMO, flitO is the head of the NCS1 flit (H), flitl is the null packet (X), so reference table 3 corresponds to 2, b00, 2, b01 and 2, bl0 gets 3 rearrangements in the parallel processing module , but choose the result of its selection identifier contxl = contx_q = 2, b00 as the output. As can be seen from Table 3, the output is: The next selection identifier contxl=2, b00, the number of flits in the aligned data segment is 1 and the result of flit alignment.

For PPM 1, the selection identifier contx 1 = 2' b00 of the PPM0 output is used as the result selection. The output is: The next selection identifier contx=2, bl0, the number of flits in the aligned data segment is 1 and the result of flit alignment.

For PPM2, the selection identifier of the output of PPM1 is contxl = 2'blO as its result selection, and the output is: the next selection identifier contx=2, b01, the number of flit in the aligned data segment is 1 and flit Aligned knots.

For PPM3, the selection identifier of the output of PPM2 is contxl = 2'b01 as its result selection, and the output is: The next selection identifier contx=2, bl0, the number of flits in the aligned data segment is 2 and flit The result of the alignment.

The selection flag for the PPM3 output is contxl=2, and bl0 is latched as contx_q as the input to PPM0 for the next clock cycle (N + 1). The operation of 8 flits when the clock is (N + 1) is the same as that of the above 4 .

After obtaining the result of the alignment operation of the four double flits, the two adjacent data segments are respectively combined by the first stage to obtain two new data segments, and each of the data segments has four flits, and the four flits. The "bubble" between the fluts of the NCS type is gone.

Finally, after the combination of the second level, the final output data including 8 flit is obtained, and all the "bubbles" between the fluts of the NCS type in the 8 flit are gone. Also, it can be seen that the number of flits of the NCS type in the eight flits of the current clock is five. This provides great convenience for subsequent storage operations.

It can be seen from the above specific embodiments that the data processing method and apparatus of the embodiments of the present invention segment the variable length packets and perform alignment operations on the segments in parallel, thereby facilitating maintenance of the design code and improving the code for design code verification. Coverage, while significantly improving timing.

According to an embodiment of the present invention, a program product is provided, which can enable a processor running the program product to implement the following functions: First, the input data including N data units corresponding to the current clock cycle is sequentially divided into M pieces. a data segment, wherein M and N are both positive integers, N is greater than or equal to 2, and M is less than N; then, the first type of data unit in each of the M data segments is aligned in parallel to make the first type Before the data unit is shifted to other types of data units, other types of data units are set to the empty packet type, where the first type is the type of the packet to be processed, and the other types are the types of packets that need not be processed; Finally, the aligned M data segments are combined into output data containing N data units.

The output data aligned in each clock cycle thus obtained is sequentially stored in the buffer, since there is no empty packet data unit between the data units to be processed of the output data or His type of data unit can effectively save storage space.

According to another embodiment of the present invention, an information storage medium for storing the above program product is provided.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in a combination of electronic hardware or computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.

A person skilled in the art can clearly understand that the specific working process of the system, the device and the unit described above can be referred to the corresponding process in the foregoing method embodiments for the convenience and brevity of the description, and details are not described herein again.

In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise. The components displayed for the unit may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions, such as those implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. And before The storage medium includes: a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes.

The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

Rights request

A method of data processing, comprising:

The input data including N data units corresponding to the current clock cycle is sequentially divided into M data segments, wherein M and N are positive integers, N is greater than or equal to 2, and M is less than N;

Performing an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to other types of data units, wherein the other types of data The units are all set to a null packet type, wherein the first type is a packet type to be processed, and the other types are packet types that do not need to be processed;

The M data segments after the alignment process are combined into output data including N data units.

2. The method according to claim 1, wherein the performing the aligning operation on the first type of data unit in each of the M data segments in parallel comprises:

Aligning the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively, to obtain X rearrangement combinations, where X corresponds to the maximum number of data units included in the first type of data packet, The unprocessed quantity identifier is used to indicate the number of unprocessed data units in the first type of data packet;

Selecting, according to the selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, one rearrangement combination from the X rearrangement combinations, wherein the selection identifier is the X unprocessed quantity identifiers One of them is used to indicate the number of unprocessed data units in the first type of data packet in the current data segment.

The method according to claim 2, wherein the performing the aligning operation on the data unit of the first type in each of the M data segments in parallel further comprises:

The selection identifier of the last data segment output is latched as the selection identifier for the first data segment alignment operation of the next clock cycle of the current clock cycle.

The method according to claim 2 or 3, wherein the performing the alignment operation on the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively comprises:

The other types of data units in the current data segment are set to an empty packet type.

The method according to any one of claims 1 to 4, wherein the aligning sequentially aligns adjacent two data segments after the alignment operation into N data units. The data is output such that the first type of data unit of the N data units is shifted before the other type of data unit.

A device for data processing, comprising:

a segmentation unit, configured to sequentially divide input data including N data units corresponding to a current clock cycle into M data segments, where M and N are positive integers, N is greater than or equal to 2, and M is less than N;

a parallel processing unit, configured to perform an alignment operation on the first type of data unit in each of the M data segments in parallel, before shifting the first type of data unit to another type of data unit, where The other types of data units are all set to a null packet type, wherein the first type is a type of data packet to be processed, and the other types are data packet types that do not need to be processed;

And a combination unit, configured to combine the M data segments after the alignment process into output data including N data units.

The apparatus according to claim 6, wherein the parallel processing unit further comprises: a parallel processing module, configured to perform parallel operations on the data units in the current data segment in parallel based on the X unprocessed quantity identifiers respectively Obtaining X rearrangement combinations, where X corresponds to a maximum number of data units included in a first type of data packet, and the unprocessed quantity identifier is used to represent unprocessed data in the first type of data packet Number of units;

a selection module, configured to select one rearrangement combination from the X rearrangement combinations according to a selection identifier outputted after the data unit alignment operation in the previous data segment of the current data segment, where the selection identifier is the X One of the unprocessed quantity identifiers is used to indicate the number of unprocessed data units in the first type of data packet in the current data segment.

8. The apparatus according to claim 7, wherein the parallel processing unit further comprises: a latching module, configured to latch a selection identifier of the last data segment output as the next clock cycle of the current clock cycle. The selection identifier for the first data segment alignment operation.

The apparatus according to claim 7 or 8, wherein the parallel processing module is further configured to set the other type of data unit in the current data segment to a null data packet type.

The apparatus according to any one of claims 6 to 9, wherein the combining unit is further configured to combine adjacent two data segments after the alignment operation into N in order. Output data of the data unit, prior to shifting the first type of data unit of the N data units to the other type of data unit.