CN109960486B - Binary data processing method, and apparatus, medium, and system thereof - Google Patents

Binary data processing method, and apparatus, medium, and system thereof Download PDF

Info

Publication number
CN109960486B
CN109960486B CN201910114581.4A CN201910114581A CN109960486B CN 109960486 B CN109960486 B CN 109960486B CN 201910114581 A CN201910114581 A CN 201910114581A CN 109960486 B CN109960486 B CN 109960486B
Authority
CN
China
Prior art keywords
value
values
predictor
data
binary data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910114581.4A
Other languages
Chinese (zh)
Other versions
CN109960486A (en
Inventor
孙锦鸿
卢帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN201910114581.4A priority Critical patent/CN109960486B/en
Publication of CN109960486A publication Critical patent/CN109960486A/en
Application granted granted Critical
Publication of CN109960486B publication Critical patent/CN109960486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to the field of data processing, and discloses a binary data processing method, a device, a medium and a system thereof. The binary data processing method in the application comprises the following steps: grouping the plurality of predicted result values; selecting one of the prediction result values from each group of the prediction result values according to a value on a data bit of the binary data corresponding to a minimum prediction result value in each group of the prediction result values to obtain a plurality of selected prediction result values; if the smallest selected predictor value among the plurality of selected predictor values belongs to 0 — (N/2-1), the smallest selected predictor value is determined to be the actual result value.

Description

Binary data processing method, and apparatus, medium, and system thereof
Technical Field
The present application relates to the field of data processing, and in particular, to a binary data processing method, apparatus, medium, and system.
Background
Leading zeros refer to the number of 0's that occur between the end of the scan starting from the highest bit of the binary data and the first 1. Specialized hardware circuits are typically present in computers to perform both Leading Zero Count (LZC) and Leading Symbol Count (LSC) operations. In existing counting schemes, the number of leading zeros is mainly derived by the encoder, either directly or indirectly. For example, the LZC/LSC operation is performed by an encoder and a MUX (multiplexer). The time consumed in the LZC/LSC operation of the encoder is large, resulting in a large delay.
Disclosure of Invention
The present application is directed to a binary data processing method, apparatus, medium, and system, which can directly select the actual value of leading zeros when the most significant bit 1 of the binary data appears in the high-level portion of the binary data, thereby effectively improving the efficiency of counting leading zeros or leading symbols.
In order to solve the above technical problem, an embodiment of the present application discloses a binary data processing method, including:
grouping a plurality of prediction result values, wherein each group comprises at least two prediction result values, the prediction result values represent the prediction number of leading zeros of the binary data, and belong to 0-N, and N is the bit number of the binary data;
selecting one of the prediction result values from each group of prediction result values according to a value on a data bit of the binary data corresponding to a smallest prediction result value in each group of prediction result values to obtain a plurality of selected prediction result values;
determining a minimum selected predictor value of the plurality of selected predictor values to be an actual result value if the minimum selected predictor value belongs to 0 to (N/2-1), wherein the actual result value represents an actual number of the leading zeros the binary data has; selecting an actual outcome value from the plurality of selected predictor values based on the value on the data bit of the binary data corresponding to the selected predictor value if a smallest selected predictor value of the plurality of selected predictor values does not belong to 0- (N/2-1).
An embodiment of the present application further discloses a data processing apparatus, including:
a grouping unit configured to group a plurality of prediction result values, wherein each group includes at least two prediction result values, the prediction result values indicate a predicted number of leading zeros that the binary data has, and the prediction result values belong to 0 to N, where N is a bit number of the binary data;
a selection unit for selecting one of the prediction result values from each group to obtain a plurality of selected prediction result values according to a value on a data bit of the binary data corresponding to a smallest prediction result value of the prediction result values in each group, and
the selection unit determines the minimum selected predictor value among the plurality of selected predictor values to be an actual result value if the minimum selected predictor value belongs to 0 to (N/2-1), wherein the actual result value represents an actual number of the leading zeros that the binary data has; the selection unit selects an actual result value from the plurality of selected predictor values according to the value on the data bit of the binary data corresponding to the selected predictor value if a smallest selected predictor value of the plurality of selected predictor values does not belong to 0 to (N/2-1).
The embodiment of the application also discloses a machine-readable medium, wherein the machine-readable medium is stored with instructions, and the instructions can be used for causing the machine to execute the binary data processing method disclosed by the embodiment.
The embodiment of the present application further discloses a system, comprising:
a memory for storing instructions for execution by one or more processors of the system, an
The processor is one of the processors of the system, and is used for executing the binary data processing method disclosed in the above embodiments.
The embodiments of the present application include, but are not limited to, the following effects:
by selecting the actual numerical value of the leading zero of the binary data from the prediction result values of the leading zero quantity which the binary data may have, the actual numerical value of the leading zero can be directly selected when the highest bit 1 of the binary data appears in the high-order part of the binary data, thereby effectively improving the efficiency of counting the leading zero or the leading symbol.
Further, by grouping the prediction result values two by two and setting the difference value of the intra-group prediction result values, it is possible to determine whether the most significant bit 1 of the binary data appears in the upper portion or the lower portion of the binary data, and directly select the actual value of the leading zero when the most significant bit 1 of the binary data appears in the upper portion of the binary data.
Furthermore, an encoder is omitted, the number of leading zeros in the binary data can be directly selected through the tree multiplexer, and the counting efficiency of the leading zeros is greatly improved.
Drawings
FIG. 1 illustrates a block diagram of a data processing apparatus, according to some embodiments of the present application;
FIG. 2 illustrates a schematic diagram of a tree multiplexer that processes 8-bit binary data, according to some embodiments of the present application;
FIG. 3 illustrates a flow diagram of a method of binary data processing, according to some embodiments of the present application;
FIG. 4 illustrates a flow diagram for one implementation of block 306 of FIG. 3, in accordance with some embodiments of the present application;
FIG. 5A illustrates a block diagram of an in-order pipeline, according to some embodiments of the present application;
FIG. 5B illustrates a block diagram of an in-order architecture core to be included in a processor, according to some embodiments of the present application;
FIG. 6 illustrates a block diagram of a processor that may have more than one core, according to some embodiments of the present application;
FIG. 7 illustrates a block diagram of a system, according to some embodiments of the present application;
FIG. 8 illustrates a block diagram of a system on a chip (SoC), according to some embodiments of the present application;
FIG. 9 illustrates a schematic diagram of a tree multiplexer that processes 8-bit binary data, according to some embodiments of the present application;
FIG. 10 illustrates a schematic diagram of a floating-point operator, according to some embodiments of the present application.
Detailed Description
The illustrative embodiments of the present application include, but are not limited to, a binary data processing method, and apparatus, medium, and system thereof.
Various aspects of the illustrative embodiments will be described using terms commonly employed by those skilled in the art to convey the substance of their work to others skilled in the art. It will be apparent, however, to one skilled in the art that some alternative embodiments may be practiced using portions of the described aspects. For purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the illustrative embodiments. It will be apparent, however, to one skilled in the art that alternative embodiments may be practiced without the specific details. In other instances, well-known features are omitted or simplified in order not to obscure the illustrative embodiments.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It is to be understood that in the present application, the binary data is preferably processed on the premise that the number of bits is even bits, and for the odd-numbered binary data, it can be changed into even bits by the complementary bit method. For example, binary data of odd bits may be changed into even bits by complementing the least significant bits with 1. For example, for 11-bit binary data 00100111001, its complement may be 001001110011, becoming an even bit. In addition, it can be understood that binary data with odd bits can also be processed by the method of the present application to obtain the number of leading zeros.
Further, it is understood that, in the present application, the prediction result value indicates the predicted number of leading zeros that the binary data has, and the prediction result value belongs to 0 to N, which is a natural number that is the number of bits of the binary data. For example, for a 16-bit binary data, there may be 0-16 leading zeros, i.e. the predicted result value is 0-16. The prediction result value of 0 indicates that the binary data are all 1, the prediction result value of 8 indicates that the 7 th bit of the binary data is 1, and the prediction result value of 16 indicates that the binary data are all 0
Some embodiments according to the present application disclose a data processing apparatus. Fig. 1 is a schematic configuration diagram of the data processing apparatus. Examples of data processing devices include, but are not limited to, processors. Examples of a processor include, but are not limited to, a host processor, a coprocessor, a general purpose processor, a special purpose processor, and/or the like. The processor may be any system including a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor. The data processing apparatus disclosed in the embodiments of the present application may be a part of a processor, or include a plurality of different processors, in addition to the processor.
Specifically, as shown in fig. 1, the data processing apparatus includes: a correspondence unit 101, a grouping unit 102, and a selection unit 103. The device comprises a corresponding unit 101 for corresponding the plurality of prediction result values with data bits in binary data, a grouping unit 102 for grouping the plurality of prediction result values, and a selecting unit 103 for finally selecting an actual result value according to a value on a data bit of the binary data corresponding to a minimum prediction result value in each group of prediction result values.
According to some embodiments of the present invention, the correspondence unit 101 corresponds a plurality of prediction result values to data bits in binary data, wherein the data bits corresponding to the prediction result values belonging to 0 to (N-2) satisfy the following condition: if the values on the data bits are 1 and the values on the higher bits than the data bits are all 0, predicting that the result value is the higher bit than the data bits; and the lowest data bit of the binary data corresponds to both the prediction result values N-1 and N.
For example, for a 10-bit binary data 0000001001, the corresponding relationship between the predicted result value and each data bit is shown in table 1 below:
table 1:
Figure GDA0002070037110000051
in table 1, if the value on the 9 th bit in the binary data is 1, there is no higher order bit than it, and thus, the predicted result value corresponding to the 9 th bit is 0, meaning that when the value on the 9 th bit is 1, the number of leading zeros (i.e., the actual result value) in the binary data is 0, and when the value on the 9 th bit is 0, the actual result value of the binary data cannot be 0. Further, like the 4 th bit in the binary data, when the bit value is 1, the higher order bit than the 4 th bit has 5 bits, and thus, the predicted result value corresponding to the 4 th bit is 5, indicating that when the bit value is 1, the actual result value of the binary data may be 5, and when the bit value is 0, the actual result value of the binary data may no longer be the predicted result value 5. It is more specific that the 0 th bit of the binary data corresponds to two predicted result values 9 and 10, when the 0 th bit is 1, the actual result value representing the binary data may be 9, and when the 0 th bit is 0, the actual result value representing the binary data may not be 9, and may be 10.
Therefore, for N-bit binary data, the data bits corresponding to the prediction result values 0 to (N-1) are N, N-1, … …, and 0 th bits of the binary data, respectively, and the data bit corresponding to the prediction result value N is also the 0 th bit of the binary data.
According to some embodiments of the present invention, the grouping unit 102 groups the prediction result values belonging to 0 to (N/2-2) and N/2 to (N-2), wherein each group includes two prediction result values and a difference of the two prediction result values is N/2; the grouping unit 102 groups a predictive result value selected from the predictive result value N and the predictive result value N-1 into a group with the predictive result value N/2-1, wherein one predictive result value is selected from the predictive result value N and the predictive result value N-1 by: the predicted result value N-1 is selected if the value on the data bit corresponding to the predicted result value N-1 is 1, and the predicted result value N is selected if the value on the data bit corresponding to the predicted result value N-1 is 0.
For example, the 10-bit binary data 0000001001 may have prediction result values of 0 to 10 grouped as:
S1(0,5)、S2(1,6)、S3(2,7)、S4(3,8)、S5(4,(9,10))
where the difference between the predicted result values in the sets S1-S4 is 5, i.e., N/2, and the selection result between the predicted result values 9 and 10 in the set S5 is 9 (the value on the data bit corresponding to 9 is 1), i.e., the set S5 is (4, 9).
In addition, it is understood that the prediction result values of the binary data may be grouped in other manners in the present application, and are not limited herein. For example, the predicted result values 0-8 of the binary data 0000001001 are grouped into 3 predicted result values each, and then 9 and 10 are grouped into one.
According to some embodiments of the present invention, the selection unit 103 finally selects the actual result value according to the value on the data bit of the binary data corresponding to the smallest predictor value in each set of predictor values.
For example, for the above-described binary data 0000001001 of 10 bits, the selection unit 103 selects an actual result value in the following manner;
1) one of the set of predictor values is selected based on a value on a data bit of the binary data corresponding to a smallest predictor value in the set of predictor values to obtain at least one selected predictor value. For example, as described above, the prediction result values of the binary data 0000001001 are grouped into groups S1(0, 5), S2(1, 6), S3(2, 7), S4(3, 8), and S5(4, 9), and the selected prediction result value 5, 6, 7, 8, 9 is selected from each group based on the value on the data bit corresponding to the minimum prediction result value 0, 1, 2, 3, 4 in each group.
2) Determining the minimum selected predictor value as the actual result value if the minimum selected predictor value of the at least one selected predictor value belongs to 0 to (N/2-1); if the smallest selected predictor value among the at least one selected predictor value does not belong to the group of 0 (N/2-1), selecting an actual result value from the at least one selected predictor value based on the value on the data bit corresponding to the selected predictor value. For example, for the binary data, the selected predicted result values 5, 6, 7, 8, 9 are all not 0 to 4 (i.e., not 0 to (N/2-1)), and therefore, the actual result value is selected based on the values on the data bits corresponding to 5, 6, 7, 8, 9.
Specifically, the actual outcome value is selected from the selected predicted outcome values 5, 6, 7, 8, 9 by:
a) grouping the selected predictor values, i.e. grouping the selected predictor values 5, 6, 7, 8, 9 as: s6(5, 6), S7(7, 8), S8 (9).
b) Selecting one of the set of predicted result values according to the value on the data bit corresponding to the smallest predicted result value in the set of predicted result values to obtain at least one selected predicted result value. That is, since the data bits corresponding to the minimum predictor values 5, 7, and 9 in each set of predictor values are 0, and 1, respectively, the predictor values 6, 8, and 9 are selected from the sets S6, S7, and S8.
c) Repeating the operations in a) and b) if the number of the at least one selected predictor values obtained is greater than 2. For the binary data, the selected prediction result values 6, 8 and 9 are grouped and selected, and the grouping is as follows: s9(6, 8) and S10 (9).
Since the values on the data bits corresponding to the minimum prediction result values 6 and 9 in S9 and S10 are both 1, 6 and 9 are selected from the sets S9 and S10.
d) For the selected predicted result values 6 and 9 as a set, since the value on the data bit corresponding to 6 is 1, 6 is selected as the actual result value.
It will be appreciated that the above steps are merely illustrated as binary data 0000001001, and that for other bits of binary data, the above functions of the data processing apparatus may be used to process the binary data to obtain the actual result value representing the number of leading zeros in the binary data. These functions may be implemented by software, hardware, or a combination of software and hardware.
The implementation of the selection unit is described below by taking a tree multiplexer as an example. The value to be selected of each multiplexer in the tree-shaped multiplexer is a prediction result value after grouping of the grouping units. Fig. 2 shows a tree multiplexer for selecting 8-bit binary data. The tree multiplexer has three levels of multiplexers, each of which (MUX) selects a predictor value for a packet. The input signal to each multiplexer may be a value on the data bit corresponding to the smaller predictor value in each set of predictor values. That is, the input signals sel00 to sel04 of the first-stage multiplexers m1 to m5 are the values on the data bits corresponding to the smaller predictor values among the to-be-selected predictor values of the multiplexers m1 to m5, respectively.
For example, for eight-bit binary data 00110110, the tree multiplexer selects the actual result value by:
1) each of the one-level multiplexes (m 1-m 5) selects one of the two candidate predictors, based on a value on a data bit of the binary data corresponding to a smallest one of the candidate predictors, to obtain at least one selected predictor. For example, for eight-bit binary data 00110110, the input signals sel00 to sel04 of the level multiplexers m1 to m5 are values on data bits corresponding to the predictive result values 0, 1, 2, 3, 7, respectively, 0, 1, 0, respectively. The selection result of m5 is 7 and the selection results of m1 to m4 are 4, 5, 2, and 3, respectively, according to the value on the corresponding data bit.
2) The candidate values for the secondary multiplexers m6 and m7 are (4, 5) and (2, 3), respectively.
Since the minimum selected predictor value 2 among the above selection results is 0 to 3, m7 selects 2 and outputs it as the final actual result value m 8.
As another example, for eight-bit binary data 00000010, the tree multiplexer selects the actual result value by:
1) each of the one-level multiplexes (m 1-m 5) selects one of the two candidate predictors, based on a value on a data bit of the binary data corresponding to a smallest one of the candidate predictors, to obtain at least one selected predictor. For example, the input signals sel00 to sel04 of the multiplexers m1 to m5 of the first stage are respectively the values of 0 on the data bits corresponding to the predicted result values 0, 1, 2, 3, 7. Therefore, the selection result of the one-stage multiplexer m5 is 8, and the selection results of m1 to m4 are 4, 5, 6 and 8, respectively.
2) The candidate values for the secondary multiplexers m6 and m7 are (4, 5) and (6, 8), respectively.
Since the smallest selected predictor value of 4 in the selection results does not belong to 0-3, the second level multiplexers m6 and m7 select 5 and 6 according to the values 0 and 1 on the data bits corresponding to 4 and 6.
3) The candidate values for the three level multiplexer m8 are 5 and 6.
Since the value on the data bit corresponding to the candidate value is 0, m8 eventually outputs the maximum value of 6 in the set of values as the actual result value.
It is understood that the above two 8-bit binary data are only examples for illustrating the operation principle of the tree multiplexer in fig. 2, and are not restrictive. The selection of any 8-bit binary data or binary data with less than 8 bits can be realized by those skilled in the art through the tree multiplexer, and the leading zero of the data is obtained.
FIG. 9 shows a tree multiplexer that selects 12-bit binary data. The tree multiplexer has three levels of multiplexers, each of which (MUX) selects a predictor value for a packet. The input signal to each multiplexer may be a value on the data bit corresponding to the smaller predictor value in each set of predictor values. That is, the input signals sel00 to sel05 of the first-stage multiplexers m1 to m6 are the values on the data bits corresponding to the smaller predictor values among the to-be-selected predictor values of the multiplexers m1 to m6, respectively.
For example, for 12-bit binary data 000000000110, the tree multiplexer selects the actual result value by:
1) each of the one-level multiplexes (m 1-m 7) selects one of the two candidate predictors, based on a value on a data bit of the binary data corresponding to a smallest one of the candidate predictors, to obtain at least one selected predictor. For example, for 12-bit binary data 000000000110, the input signals sel00 through sel06 of the multiplexers m1 through m7 of the first stage are respectively the values of 0 on the data bits corresponding to the prediction result values 0, 1, 2, 3, 4, 5, 11. The selection result of m7 is 2 and the selection results of m1 to m6 are 6, 7, 8, 9, 10, 12, respectively, depending on the value on the corresponding data bit.
2) The candidate values for the secondary multiplexers m8, m9, and m10 are (6, 7), (8, 9), and (10 and 12), respectively.
The input signals sel10 to sel12 of the second-stage multiplexers m8 to m10 are values of 0, and 1, respectively, on the data bits corresponding to the predictive result values 6, 8, and 10, respectively. The selection results of m8 to m10 are 7, 9 and 10 respectively according to the values on the corresponding data bits. 3) The candidate value for the three-level multiplexer m11 is (7, 9, 10). Since the values in the data bits corresponding to 7, 9, and 10 are 0, and 1, respectively, the predicted result value 10 is selected as the actual result value.
It is to be understood that although the present embodiment takes the calculation of leading zeros for 10-bit, 8-bit, 12-bit binary data as an example, it will be understood by those skilled in the art that the tree multiplexer of the present embodiment can be used for the calculation of leading zeros for any binary data of any number of bits.
Furthermore, it will be appreciated that the tree multiplexer may be any tree multiplexer located in a data processing module or device, for example in the processor described above, or in a floating point operator.
FIG. 10 shows a schematic diagram of a common exemplary floating-point operator. The leading zero calculator can calculate the number of leading zeros in the binary data by adopting the technical scheme disclosed by the application. The MUX is a data selector. Exponent difference logic is used to calculate the difference between exponent 1 and exponent 2, a logical negation is used to negate operand 2, each shifter and shift logic unit (e.g., an alignment shift) is used to shift the data, and an adder is a unit that generates the sum of the data. It is understood that the leading zero calculation technical solution of the present application can also be used for floating-point arithmetic units with other structures, and is not limited to the structure in fig. 10.
As mentioned above, the tree multiplexer is directly adopted in the method, the number of leading zeros in binary data can be determined, an encoder is omitted, and the counting efficiency of the leading zeros is greatly improved.
Some embodiments according to the present application disclose a method of processing binary data. FIG. 3 is a flow diagram of a method of processing binary data according to some embodiments of the present application.
According to some embodiments of the present application, in block 301, a corresponding unit 101 or other unit corresponds a plurality of predictor values to data bits in binary data.
It is understood that, in the present application, a data bit in binary data corresponding to a predicted result value means that whether an actual result value of the binary data is likely to be the predicted result value can be determined according to a value on the data bit. Wherein, for binary data of N bits, data bits corresponding to prediction result values belonging to 0 to (N-1) satisfy the following condition:
if the values on the data bits are 1 and the values on the higher bits than the data bits are all 0, predicting that the result value is the higher bit than the data bits; and the lowest data bits of the binary data correspond to both the predictor values N-1 and N.
In block 302, the grouping unit 102 or other unit groups a plurality of predicted result values of binary data. Specifically, prediction result values belonging to 0 to (N/2-2) and N/2 to (N-2) are grouped, wherein each group includes two prediction result values and the difference value of the two prediction result values is N/2; grouping a predictor value selected from the predictor value N and the predictor value N-1 with the predictor value N/2-1, wherein one predictor value is selected from the predictor value N and the predictor value N-1 by: the predicted result value N-1 is selected if the value on the data bit corresponding to the predicted result value N-1 is 1, and the predicted result value N is selected if the value on the data bit corresponding to the predicted result value N-1 is 0.
In this manner, by grouping the prediction result values two by two and setting the difference value of the intra-group prediction result values, it is possible to determine whether the most significant bit 1 of the binary data appears in the upper portion or the lower portion of the binary data, and directly select the actual value of the leading zero when the most significant bit 1 of the binary data appears in the upper portion of the binary data.
Furthermore, it is understood that the predictor values can be grouped in other grouping forms, for example, each group includes 4 or 6 predictor values, and then a selection is made from each group, which is not limited herein.
In block 303, the selection unit 103 or other unit selects one of the predictor values from each set of predictor values based on the value on the data bit of the binary data corresponding to the smallest predictor value in each set of predictor values to obtain a plurality of selected predictor values. Specifically, if the value of the data bit corresponding to the smaller one of the set of predictor values is 1, the smaller predictor value is selected, and if the value of the data bit corresponding to the smaller one of the set of predictor values is 0, the larger one of the set is selected.
In block 304, selection unit 103 or another unit determines whether the smallest selected predictor value among the plurality of selected predictor values belongs to 0 (N/2-1).
If the result of the above determination is yes, then selection unit 103 or another unit determines the minimum selected predicted outcome value to be the actual outcome value in block 305.
If the result of the above determination is negative, then in block 306 selection unit 103 or other unit selects an actual outcome value from the plurality of selected predictor values based on the value on the data bit of the binary data corresponding to the selected predictor value.
Specifically, as shown in fig. 4, the selection unit 103 selects an actual result value from among a plurality of selected predicted result values by:
in block 401, the selection unit 103 or other unit groups the plurality of selected prediction result values;
in block 402, the selection unit 103 or other unit selects a predictor value from each set of predictor values based on the value on the data bit on the binary data corresponding to the smallest predictor value in each set of predictor values to obtain at least one selected predictor value;
in block 403, the selection unit 103 or other unit determines the number of resulting selected predictor values;
if the number of resulting selected predictor values is 1, then selection unit 103 or other units determine the selected predictor value as the actual result value in block 404;
if the number of resulting selected predictor values is 2, block 402 is repeated;
if the number of resulting selected predictor values is greater than 2, blocks 401 and 402 are repeated.
The present embodiment is described below by way of a specific example. For example, in one exemplary embodiment, the highest bit 1 of the binary data appears in the high portion of the data (i.e., the 0 th to N/2 th bits from left to right of the binary data) is 00010110, and the processing method is as follows:
1) the prediction result value of the binary data is 0-8, and the data bit and the value thereof corresponding to each prediction result value are shown in the following table 2:
Figure GDA0002070037110000121
grouping the prediction result values pairwise, wherein the grouping results are as follows:
S1(0,4)、S2(1,5)、S3(2,6)、S4(3,(7,8))
2) the predicted result value is selected from each group based on the data bit corresponding to the smaller predicted result value in each group, the smaller predicted result value is selected when the value on the corresponding data bit is 1, and the larger predicted result value is selected when the value on the corresponding data bit is 0, where bit 0 is 0, so 8 is selected from 7 and 8. The selection results were as follows: 4,5,6,3
Since 3's belonging to 0 to 3 exist in the selected prediction result value, it can be determined that the actual result value of leading zeros of the binary data is 3.
For another example, in another example, the highest bit 1 of the binary data appears in the lower part of the data (i.e. the 0 th to N/2 th bits from right to left of the binary data), which is 00000010, the processing method is as follows:
1) the prediction result value of the binary data is 0-8, and the data bit and the value thereof corresponding to each prediction result value are shown in the following table 2:
Figure GDA0002070037110000122
grouping the prediction result values pairwise, wherein the grouping results are as follows:
S1(0,4)、S2(1,5)、S3(2,6)、S4(3,(7,8))
2) the predicted result value is selected from each group based on the data bit corresponding to the smaller predicted result value in each group, the smaller predicted result value is selected when the value on the corresponding data bit is 1, and the larger predicted result value is selected when the value on the corresponding data bit is 0, where bit 0 is 0, so 8 is selected from 7 and 8. The selection results were as follows: 4,5,6,8
Since there are no predictors from 0 to 3 among the selected predictors, it is necessary to select an actual predictor among the selected predictors 4, 5, 6, 8. The selection method is as follows:
the 4, 5, 6, 8 are grouped in order from small to large into S5(4, 5) and S6(6, 8).
Since the values on the data bits corresponding to the smaller prediction result values 4 and 6 of S5 and S6 are 0 and 1, respectively, the selection result of the above grouping is: 5, and 6, and since 6 is the minimum predicted result value in S6, 6 is determined to be the actual result value.
Further, it is understood that, in another example, if the binary data is 00000000, after the results selected in step 2) are 4, 5, 6, 8, 5 and 8 are selected from S5(4, 5) and S6(6, 8), and then a group S7(5, 8) of 5 and 8 is selected, and 8 is selected as the actual result value since the value on the data bit corresponding to the predicted result value 5 is 0.
According to the method and the device, the actual numerical value of the leading zero of the binary data is selected from the prediction result values of the leading zero quantity which the binary data possibly have, and when the highest bit 1 of the binary data appears in the high-order part of the binary data, the actual numerical value of the leading zero can be directly selected, so that the efficiency of counting the leading zero or the leading symbol is effectively improved.
FIG. 5A is a block diagram illustrating a processor pipeline according to an embodiment of the present application. FIG. 5B is a block diagram illustrating one architecture core to be included in a processor according to an embodiment of the present application.
In FIG. 5A, a processor pipeline 500 includes, but is not limited to, an instruction fetch stage 501, an instruction decode stage 502, an instruction execution and memory access stage 503, a write back/write stage 504, an instruction retirement stage 505, and/or other pipeline stages, among others. Although FIG. 5A illustrates an in-order pipeline, those skilled in the art will appreciate that other techniques may implement other embodiments for the processor pipeline shown in FIG. 10. For example, processor pipeline 500 may contain other or different processing stages, such as register renaming, out-of-order issue/execution pipelines, and so forth. In particular, processor pipeline 1000 may also include, but is not limited to, a length decode stage (not shown) to length decode fetched instructions; an allocate stage (not shown), a register rename stage (not shown), a dispatch stage (also called a dispatch or issue stage) (not shown) for decoded instructions; an exception handling stage and a commit stage (not shown), and so on.
In FIG. 5B, processor core 510 includes, but is not limited to, an L1 instruction cache unit 510, an instruction fetch and decode unit 512, registers 513, an execution unit 514, a load/store unit 515, an L1 data cache unit 516, and/or other units. Processor core 510 may be a Reduced Instruction Set Computing (RISC) core, a Complex Instruction Set Computing (CISC) core, a Very Long Instruction Word (VLIW) core, or a hybrid or prospective core type. As another option, processor core 510 may be a special-purpose core, such as a network or communication core, compression engine, coprocessor core, general purpose computing graphics processor unit (GPGPU) core, graphics core, or the like.
According to some embodiments of the present application, instruction fetch and decode unit 512 fetches instructions from L1 instruction cache 511 and performs instruction decode functions, generating as output one or more micro-operations, micro-code entry points, micro-instructions, other instructions, or other control signals decoded from, or otherwise reflective of, the original instructions. Instruction fetch and decode unit 512 may be implemented using a variety of different mechanisms. Examples of suitable mechanisms include, but are not limited to, look-up tables, hardware implementations, Programmable Logic Arrays (PLAs), microcode read-only memories (ROMs), and the like. In one embodiment, core 510 includes a microcode ROM or other medium for storing microcode for certain macro-instructions. Instruction fetch and decode unit 512 may be coupled to execution unit 514 and/or load/store unit 515 through registers 513. The registers 513 include one or more registers, where different registers store one or more different data types, the terms scalar integer, scalar floating point, packed integer, packed floating point, vector integer, vector floating point, state (an instruction pointer that is the address of the next instruction to be executed), and so forth.
Those skilled in the art of the present application will appreciate that other techniques may implement other aspects of the present application. For example, instruction fetch and decode unit 512 may also be coupled to execution unit 514 and/or load/store unit 515 (not shown) without registers 513.
Execution unit 514 and load/store unit 515 implement execution functions in a processor pipeline. According to some embodiments of the present application, a set of one or more execution units 514 and a set of one or more load/store units 515 may constitute an execution engine of a processor. Execution unit 514 may perform various operations (e.g., shifts, additions, subtractions, multiplications) on various types of data (e.g., scalar floating point, packed integer, packed floating point, vector integer, vector floating point). Accordingly, the execution unit 514 may include, but is not limited to, a scalar arithmetic logic operation unit, a vector arithmetic logic operation unit, a fixed function unit (fix function unit), and/or the like. While some embodiments may include, but are not limited to, multiple execution units dedicated to a particular function or set of functions, other embodiments may include, but are not limited to, only one execution unit or multiple execution units that all perform all functions. The registers 513 and the L1 data cache 516 implement write back/write and instruction retirement functions in the pipeline.
It should be understood that other techniques may implement other embodiments for the processor core architecture of FIG. 5B. For example, processor core 510 may also include, but is not limited to, an instruction fetch and decode unit 512 to perform a length decode stage; a register rename/allocate unit (not shown) and a dispatch unit (not shown) coupled between instruction fetch and decode unit 512 and register 513, wherein the register rename/allocate unit executes a register rename stage/allocate stage and the dispatch unit executes a dispatch stage; the units may be involved in an out-of-order issue/execution core architecture of exception handling stages, and so on.
In some embodiments of the present application, processor core 510 is coupled to L2 memory 517, which includes, but is not limited to, a level two (L2) cache unit (not shown), which L2 cache unit may be further coupled to one or more other levels of cache, and ultimately to main memory (not shown).
It should be appreciated that the core 510 may support multithreading (performing two or more parallel operations or sets of threads), and may be accomplished in a variety of ways including, but not limited to, time-division multithreading, simultaneous multithreading (where a single physical core provides a logical core for each of the threads that the physical core is simultaneously multithreading), or a combination thereof.
Although register renaming is described in the context of out-of-order execution, it should be understood that register renaming may be used in an in-order architecture. While the illustrated embodiment of the processor also includes, but is not limited to, a separate instruction and data cache 511/516 and a shared L2 memory 517, alternative embodiments may have a single internal cache for both instructions and data, such as, for example, a level one (L1) internal cache or multiple levels of internal cache. In some embodiments, the system may include, but is not limited to, a combination of an internal cache and an external cache, where the external cache is external to the core and/or external to the processor. Alternatively, all caches may be internal to the core and/or external to the processor.
FIG. 6 is a block diagram of a processor that may have more than one core according to an embodiment of the application. In one embodiment, processor 600 may include, but is not limited to, one or more processor cores 602A-602N. Each processor core 602A-602N may include, but is not limited to, a cache unit 604A-604N and a register unit 606A-606N. It should be understood that the processor cores 602A-602N may also include other processor core units as shown in fig. 5B, according to another embodiment, but are not repeated here to simplify the description.
It should be understood that other techniques may implement other embodiments for the processor core architecture shown in FIG. 6. For example, processor 600 may also include a system agent unit (not shown), one or more bus controller units (not shown), dedicated logic (not shown), and so forth. The dedicated logic (not shown) may comprise one or more dedicated cores (not shown) of science (throughput), among others. According to one embodiment, the processor cores 602A-602N may be one or more general purpose cores (e.g., a general purpose in-order core, a general purpose out-of-order core, a combination of both); or may be one or more dedicated cores primarily for graphics and/or science (throughput). Thus, the processor 600 may be a general-purpose processor, a coprocessor or special-purpose processor, such as, for example, a network or communication processor, compression engine, graphics processor, GPGPU (general purpose graphics processing unit), a high-throughput Many Integrated Core (MIC) coprocessor, embedded processor, or the like. The processor may be implemented on one or more chips. Processor 600 may be part of one or more substrates and/or processor 600 may be implemented on one or more substrates using any of a number of processing technologies, such as, for example, BiCMOS, CMOS, or NMOS.
The memory hierarchy of the processor includes one or more levels of cache within each core, and a set of one or more shared cache molecules (not shown). The set of shared cache units may include one or more mid-level caches, such as a level two (L2), a level three (L3), a level four (L4), or other levels of cache, a Last Level Cache (LLC), and/or combinations thereof. In one embodiment, processor 600 may also include a ring-based interconnect unit (not shown) to interconnect the dedicated logic (not shown), the set of shared cache units (not shown), and the system agent unit (not shown) described above, although alternative embodiments may use any number of well-known techniques to interconnect these units.
In some embodiments, one or more of cores 602A-N may be multi-threaded. The system agent units (not shown) described above include, but are not limited to, components of the coordination and operation cores 602A-N, such as Power Control Units (PCUs) and display units. The PCU may be or include logic and components necessary for adjusting the power states of cores 602A-N and/or the dedicated logic (not shown) described above. The display unit is used to drive one or more externally connected displays.
The cores 602A-N may be homogeneous or heterogeneous in terms of architectural instruction set; that is, two or more of the cores 602A-N may be capable of executing the same instruction set, while other cores may be capable of executing only a subset of the instruction set or a different instruction set.
FIG. 7 is a block diagram of a system according to an embodiment of the present application. The system includes, but is not limited to, laptop devices, desktop machines, handheld PCs, personal digital assistants, engineering workstations, servers, network appliances, network hubs, switches, embedded processors, Digital Signal Processors (DSPs), graphics devices, video game devices, set-top boxes, microcontrollers, cellular telephones, portable media players, handheld devices, and other systems of various other electronic devices. In general, a number of systems and electronic devices capable of containing the processors and/or other execution logic disclosed in this application are generally suitable.
Referring now to FIG. 7, shown is a block diagram of a system 700 in accordance with one embodiment of the present application. System 700 may include one or more processors 701 coupled to a controller hub 703. In one embodiment, controller hub 703 includes, but is not limited to, a Graphics Memory Controller Hub (GMCH) (not shown) and an input/output hub (IOH) (which may be on separate chips) (not shown), where the GMCH includes a memory and a graphics controller and is coupled with the IOH. System 700 may also include coprocessor 702 and memory 704 coupled to controller hub 703. Alternatively, one or both of the memory and GMCH may be integrated within the processor (as described herein), with the memory 704 and coprocessor 702 coupled directly to the processor 701 and controller hub 703, with the controller hub 703 and IOH in a single chip.
The optional nature of the additional processor 702 is represented in fig. 7 by dashed lines. Processor 701 may include one or more of the processing cores described herein and may be some version of processor 600.
The memory 704 may be, for example, Dynamic Random Access Memory (DRAM), Phase Change Memory (PCM), or a combination of the two. For at least one embodiment, controller hub 703 communicates with processor 701 via a multi-drop bus such as a front-side bus (FSB), a point-to-point interface such as a quick channel interconnect (QPI), or similar connection 706.
In one embodiment, coprocessor 702 is a special-purpose processor, such as, for example, a high-throughput MIC processor, a network or communication processor, compression engine, graphics processor, GPGPU, embedded processor, or the like. In one embodiment, controller hub 703 may include an integrated graphics accelerator. The instruction execution method proposed in the present application may be performed by the coprocessor 702. And the architecture of the coprocessor may also be some version of the processor 600.
In one embodiment, processor 701 executes instructions that control data processing operations of a general type. Coprocessor instructions may be embedded in these instructions. The processor 701 recognizes these coprocessor instructions as being of a type that should be executed by the attached coprocessor 702. Thus, the processor 701 issues these coprocessor instructions (or control signals representing coprocessor instructions) on a coprocessor bus or other interconnect to coprocessor 702. Coprocessor 702 accepts and executes the received coprocessor instructions.
Referring now to fig. 8, shown is a block diagram of a SoC (System on Chip) 800 in accordance with an embodiment of the present application. In fig. 8, like parts have the same reference numerals. In addition, the dashed box is an optional feature of more advanced socs. In FIG. 8, interconnect unit 850 is coupled to application processor 810, which includes, but is not limited to, a set of one or more cores 602A-N as well as shared cache units 604A-704N and registers 606A-606N; a system agent unit 880; a bus controller unit 890; an integrated memory controller unit 840; a set or one or more coprocessors 820 which may include integrated graphics logic, an image processor, an audio processor, and a video processor; a Static Random Access Memory (SRAM) unit 830; a Direct Memory Access (DMA) unit 860. In one embodiment, coprocessor 820 includes a special-purpose processor, such as, for example, a network or communication processor, compression engine, GPGPU, a high-throughput MIC processor, embedded processor, or the like.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
Further technical solutions of the present application are summarized in the following examples:
example 1: a method of processing binary data, comprising:
grouping a plurality of prediction result values, wherein each group comprises at least two prediction result values, the prediction result values represent the prediction number of leading zeros of the binary data, and belong to 0-N, and N is the bit number of the binary data;
selecting one of the prediction result values from each group of prediction result values according to a value on a data bit of the binary data corresponding to a smallest prediction result value in each group of prediction result values to obtain a plurality of selected prediction result values;
determining a minimum selected predictor value of the plurality of selected predictor values to be an actual result value if the minimum selected predictor value belongs to 0 to (N/2-1), wherein the actual result value represents an actual number of the leading zeros the binary data has;
selecting an actual outcome value from the plurality of selected predictor values based on the value on the data bit of the binary data corresponding to the selected predictor value if a smallest selected predictor value of the plurality of selected predictor values does not belong to 0- (N/2-1).
Example 2: the method of example 1, further comprising
Corresponding the plurality of predictor values to the data bits in the binary data, wherein:
the data bits corresponding to the prediction result values belonging to 0 to (N-2) satisfy the following condition: if the values on the data bits are 1 and the values on the higher bits than the data bits are all 0, the predicted result value is the number of bits of the higher bits than the data bits; and the data bits corresponding to the predictor values N-1 and N are the lowest data bits of the binary data.
Example 3: the method of example 1, wherein the grouping the plurality of prediction result values comprises:
grouping prediction result values belonging to 0 to (N/2-2) and N/2 to (N-2), wherein each group includes two prediction result values and a difference value of the two prediction result values is N/2; and grouping a predictor value selected from the predictor value N and the predictor value N-1 with the predictor value N/2-1, wherein one predictor value is selected from the predictor value N and the predictor value N-1 by:
if the value on the data bit corresponding to the predicted result value N-1 is 1, then the predicted result value N-1 is selected, e.g.
If the value on the data bit corresponding to the predicted result value N-1 is 0, the predicted result value N is selected.
Example 4: the method of example 1, wherein said selecting an actual outcome value from the plurality of selected predictor values based on the value on the data bit of the binary data corresponding to the selected predictor value if a smallest predictor value of the plurality of selected predictor values does not belong to 0 to (N/2-1) comprises:
grouping the plurality of selected prediction result values;
selecting a predictor value from each set of predictor values according to a value on a data bit on the binary data corresponding to a smallest predictor value in each set of predictor values to obtain at least one selected predictor value;
repeating the grouping and the selecting until the number of the at least one selected predictor values obtained is 1, and determining the obtained selected predictor value as an actual result value.
Example 5: the method of any of examples 1 to 4, wherein the selecting one predictor value from each set of predictor values comprises:
selecting the minimum predicted result value if the value of the data bit corresponding to the minimum predicted result value in each group of predicted result values is 1, and selecting the predicted result values in the group except the minimum predicted result value if the value of the data bit corresponding to the minimum predicted result value in each group of predicted result values is 0.
Example 6: a data processing apparatus, comprising:
a grouping unit configured to group a plurality of prediction result values, wherein each group includes at least two prediction result values, the prediction result values indicate a predicted number of leading zeros that the binary data has, and the prediction result values belong to 0 to N, where N is a bit number of the binary data;
a selection unit for selecting one of the prediction result values from each group to obtain a plurality of selected prediction result values according to a value on a data bit of the binary data corresponding to a smallest prediction result value of the prediction result values in each group, and
the selection unit determines the minimum selected predictor value among the plurality of selected predictor values to be an actual result value if the minimum selected predictor value belongs to 0 to (N/2-1), wherein the actual result value represents an actual number of the leading zeros that the binary data has; the selection unit selects an actual result value from the plurality of selected predictor values according to the value on the data bit of the binary data corresponding to the selected predictor value if a smallest selected predictor value of the plurality of selected predictor values does not belong to 0 to (N/2-1).
Example 7: the data processing apparatus of example 6, further comprising:
a correspondence unit for corresponding the plurality of predictor values to the data bits in the binary data, wherein:
the data bits corresponding to the prediction result values belonging to 0 to (N-2) satisfy the following condition: if the values on the data bits are 1 and the values on the higher bits than the data bits are all 0, the predicted result value is the number of bits of the higher bits than the data bits; and the lowest data bits of the binary data correspond to the prediction result values N-1 and N.
Example 8: the data processing apparatus of example 6, wherein the grouping unit groups the plurality of prediction result values by:
grouping prediction result values belonging to 0 to (N/2-2) and N/2 to (N-2), wherein each group includes two prediction result values and a difference value of the two prediction result values is N/2; grouping a predictor value selected from the predictor value N and the predictor value N-1 with the predictor value N/2-1, wherein one predictor value is selected from the predictor value N and the predictor value N-1 by: .
Example 9: the data processing apparatus of example 6, wherein the selection unit to select an actual outcome value from the plurality of selected predictor outcome values comprises:
grouping the plurality of selected prediction result values;
selecting a predictor value from each set of predictor values according to a value on a data bit on the binary data corresponding to a smallest predictor value in each set of predictor values to obtain at least one selected predictor value;
repeating the grouping and the selecting until the number of the at least one selected predictor values obtained is 1, and determining the obtained selected predictor value as an actual result value.
Example 10: the data processing apparatus according to any one of examples 6 to 9, wherein the selection unit selects one predictor value from each set of predictor values by:
selecting the minimum predicted result value if the value of the data bit corresponding to the minimum predicted result value in each group of predicted result values is 1, and selecting the predicted result values in the group except the minimum predicted result value if the value of the data bit corresponding to the minimum predicted result value in each group of predicted result values is 0.
Example 11: the data processing apparatus of example 10, wherein the selection unit comprises a tree multiplexer.
Example 12: a machine-readable medium having stored thereon instructions which, when executed on a machine, cause the machine to perform a method of processing binary data as described in any one of examples 1 to 5.
Example 13: a system, comprising:
a memory for storing instructions for execution by one or more processors of the system, an
A processor, being one of processors of the system, for performing the method of processing binary data according to any one of examples 1 to 5.
As used herein, the term module or unit may refer to or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality, or may be part of an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.
It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (13)

1. A method for processing binary data, comprising:
in the case where the number of bits of binary data is an even number, the following operations are performed:
grouping a plurality of prediction result values, wherein each group comprises two prediction result values, the prediction result values represent the prediction number of leading zeros of the binary data, and belong to 0-N, and N is the bit number of the binary data;
selecting one of the prediction result values from each group of prediction result values according to a value on a data bit of the binary data corresponding to a smallest prediction result value in each group of prediction result values to obtain a plurality of selected prediction result values;
determining a minimum selected predictor value of the plurality of selected predictor values to be an actual result value if the minimum selected predictor value belongs to 0 to (N/2-1), wherein the actual result value represents an actual number of the leading zeros the binary data has;
selecting an actual outcome value from the plurality of selected predictor values based on the value on the data bit of the binary data corresponding to the selected predictor value if a smallest selected predictor value of the plurality of selected predictor values does not belong to 0- (N/2-1).
2. The method of claim 1, further comprising corresponding the plurality of predictor values to the data bits in the binary data, wherein: the data bits corresponding to the prediction result values belonging to 0 to (N-2) satisfy the following condition: if the values on the data bits are 1 and the values on the higher bits than the data bits are all 0, the predicted result value is the number of bits of the higher bits than the data bits; and, data bits corresponding to the prediction result values N-1 and N are the lowest data bits of the binary data.
3. The method of claim 1, wherein grouping the plurality of prediction result values comprises:
grouping prediction result values belonging to 0 to (N/2-2) and N/2 to (N-2), wherein each group includes two prediction result values and a difference value of the two prediction result values is N/2; and
grouping a predictor value selected from the predictor value N and the predictor value N-1 with the predictor value N/2-1, wherein one predictor value is selected from the predictor value N and the predictor value N-1 by:
the predicted result value N-1 is selected if the value on the data bit corresponding to the predicted result value N-1 is 1, and the predicted result value N is selected if the value on the data bit corresponding to the predicted result value N-1 is 0.
4. The method of claim 1, wherein selecting an actual outcome value from the plurality of selected predictor values based on the value on the data bit of the binary data corresponding to the selected predictor value if a smallest predictor value of the plurality of selected predictor values does not belong to 0 to (N/2-1) comprises:
grouping the plurality of selected prediction result values;
selecting a predictor value from each set of predictor values according to a value on a data bit on the binary data corresponding to a smallest predictor value in each set of predictor values to obtain at least one selected predictor value;
repeating the grouping and the selecting until the number of the at least one selected predictor values obtained is 1, and determining the obtained selected predictor value as the actual result value.
5. The method of any of claims 1-4, wherein selecting a predictor value from each set of predictor values comprises:
selecting the minimum predicted result value if the value of the data bit corresponding to the minimum predicted result value in each group of predicted result values is 1, and selecting the predicted result values in each group except the minimum predicted result value if the value of the data bit corresponding to the minimum predicted result value in each group of predicted result values is 0.
6. A data processing apparatus, comprising:
a grouping unit configured to group a plurality of prediction result values in a case where a bit number of binary data is an even number, wherein each group includes two of the prediction result values, the prediction result values indicate a predicted number of leading zeros that the binary data has, and belong to 0 to N, N being the bit number of the binary data;
a selection unit that selects one predictive result value from each group of predictive result values to obtain a plurality of selected predictive result values according to a value on a data bit of the binary data corresponding to a smallest predictive result value among the groups of predictive result values in a case where a bit number of the binary data is an even number, and determines the smallest selected predictive result value as an actual result value if the smallest selected predictive result value among the plurality of selected predictive result values belongs to 0 — (N/2-1), the actual result value representing an actual number of leading zeros the binary data has; the selection unit selects an actual result value from the plurality of selected predictor values according to the value on the data bit of the binary data corresponding to the selected predictor value if a smallest selected predictor value of the plurality of selected predictor values does not belong to 0 to (N/2-1).
7. The data processing apparatus of claim 6, further comprising:
a correspondence unit for corresponding the plurality of predictor values to the data bits in the binary data, wherein:
the data bits corresponding to the prediction result values belonging to 0 to (N-2) satisfy the following condition: if the values on the data bits are 1 and the values on the higher bits than the data bits are all 0, the predicted result value is the number of bits of the higher bits than the data bits; and the data bits corresponding to the predictor values N-1 and N are the lowest data bits of the binary data.
8. The data processing apparatus according to claim 6, wherein the grouping unit groups the plurality of prediction result values by:
grouping prediction result values belonging to 0 to (N/2-2) and N/2 to (N-2), wherein each group includes two prediction result values and a difference value of the two prediction result values is N/2; grouping a predictor value selected from the predictor value N and the predictor value N-1 with the predictor value N/2-1, wherein one predictor value is selected from the predictor value N and the predictor value N-1 by:
the predicted result value N-1 is selected if the value on the data bit corresponding to the predicted result value N-1 is 1, and the predicted result value N is selected if the value on the data bit corresponding to the predicted result value N-1 is 0.
9. The data processing apparatus of claim 6, wherein the selection unit to select an actual outcome value from the plurality of selected predictor outcome values comprises:
grouping the plurality of selected prediction result values;
selecting a predictor value from each set of predictor values according to a value on a data bit on the binary data corresponding to a smallest predictor value in each set of predictor values to obtain at least one selected predictor value;
repeating the grouping and the selecting until the number of the at least one selected predictor values obtained is 1, and determining the obtained selected predictor value as an actual result value.
10. The data processing apparatus according to any one of claims 6 to 9, wherein the selection unit selects one predictor value from each set of predictor values by:
selecting the minimum predicted result value if the value of the data bit corresponding to the minimum predicted result value in each group of predicted result values is 1, and selecting the predicted result values in each group except the minimum predicted result value if the value of the data bit corresponding to the minimum predicted result value in each group of predicted result values is 0.
11. A data processing apparatus as claimed in claim 10, characterized in that the selection unit comprises a tree multiplexer.
12. A machine-readable medium having stored thereon instructions which, when executed on a machine, cause the machine to perform the method of processing binary data according to any one of claims 1 to 5.
13. A system, comprising:
a memory for storing instructions for execution by one or more processors of the system, an
Processor, being one of the processors of a system, for performing the method of processing binary data according to any one of claims 1 to 5.
CN201910114581.4A 2019-02-14 2019-02-14 Binary data processing method, and apparatus, medium, and system thereof Active CN109960486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910114581.4A CN109960486B (en) 2019-02-14 2019-02-14 Binary data processing method, and apparatus, medium, and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910114581.4A CN109960486B (en) 2019-02-14 2019-02-14 Binary data processing method, and apparatus, medium, and system thereof

Publications (2)

Publication Number Publication Date
CN109960486A CN109960486A (en) 2019-07-02
CN109960486B true CN109960486B (en) 2021-06-25

Family

ID=67023670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910114581.4A Active CN109960486B (en) 2019-02-14 2019-02-14 Binary data processing method, and apparatus, medium, and system thereof

Country Status (1)

Country Link
CN (1) CN109960486B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1143218A (en) * 1994-09-29 1997-02-19 国际商业机器公司 Method and device for determining number of leading zero or 1 in binary data threshold
US6615228B1 (en) * 2000-05-30 2003-09-02 Hewlett-Packard Development Company, Lp Selection based rounding system and method for floating point operations
US6697828B1 (en) * 2000-06-01 2004-02-24 Sun Microsystems, Inc. Optimized method and apparatus for parallel leading zero/one detection
CN1503123A (en) * 2002-11-21 2004-06-09 智慧第一公司 Random number generator bit string filter and method
US7099910B2 (en) * 2003-04-07 2006-08-29 Sun Microsystems, Inc. Partitioned shifter for single instruction stream multiple data stream (SIMD) operations
CN101174200A (en) * 2007-05-18 2008-05-07 清华大学 5-grade stream line structure of floating point multiplier adder integrated unit
CN102122240A (en) * 2011-01-20 2011-07-13 东莞市泰斗微电子科技有限公司 Data type conversion circuit
US8260837B2 (en) * 2005-02-10 2012-09-04 International Business Machines Corporation Handling denormal floating point operands when result must be normalized
CN102664637A (en) * 2012-04-12 2012-09-12 北京中科晶上科技有限公司 Method and device for confirming leading zero number of binary data
CN108052307A (en) * 2017-11-27 2018-05-18 北京时代民芯科技有限公司 The advanced operation method and system of processor floating point unit leading zero quantity
CN108153513A (en) * 2016-12-06 2018-06-12 Arm 有限公司 Leading zero is predicted

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8495121B2 (en) * 2008-11-20 2013-07-23 Advanced Micro Devices, Inc. Arithmetic processing device and methods thereof

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1143218A (en) * 1994-09-29 1997-02-19 国际商业机器公司 Method and device for determining number of leading zero or 1 in binary data threshold
US6615228B1 (en) * 2000-05-30 2003-09-02 Hewlett-Packard Development Company, Lp Selection based rounding system and method for floating point operations
US6697828B1 (en) * 2000-06-01 2004-02-24 Sun Microsystems, Inc. Optimized method and apparatus for parallel leading zero/one detection
CN1503123A (en) * 2002-11-21 2004-06-09 智慧第一公司 Random number generator bit string filter and method
US7099910B2 (en) * 2003-04-07 2006-08-29 Sun Microsystems, Inc. Partitioned shifter for single instruction stream multiple data stream (SIMD) operations
US8260837B2 (en) * 2005-02-10 2012-09-04 International Business Machines Corporation Handling denormal floating point operands when result must be normalized
CN101174200A (en) * 2007-05-18 2008-05-07 清华大学 5-grade stream line structure of floating point multiplier adder integrated unit
CN102122240A (en) * 2011-01-20 2011-07-13 东莞市泰斗微电子科技有限公司 Data type conversion circuit
CN102664637A (en) * 2012-04-12 2012-09-12 北京中科晶上科技有限公司 Method and device for confirming leading zero number of binary data
CN108153513A (en) * 2016-12-06 2018-06-12 Arm 有限公司 Leading zero is predicted
CN108052307A (en) * 2017-11-27 2018-05-18 北京时代民芯科技有限公司 The advanced operation method and system of processor floating point unit leading zero quantity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
前导零预测逻辑的设计与应用;朱光前;《中国优秀硕士学位论文全文数据库·信息科技辑》;20170315;全文 *

Also Published As

Publication number Publication date
CN109960486A (en) 2019-07-02

Similar Documents

Publication Publication Date Title
CN112099852B (en) Variable format, variable sparse matrix multiply instruction
US20210263993A1 (en) Apparatuses and methods to accelerate matrix multiplication
CN115421686A (en) FP16-S7E8 hybrid precision for deep learning and other algorithms
WO2018063777A1 (en) Low energy consumption mantissa multiplication for floating point multiply-add operations
US20120072703A1 (en) Split path multiply accumulate unit
CN107925420B (en) Heterogeneous compression architecture for optimized compression ratios
KR101524450B1 (en) Method and apparatus for universal logical operations
CN104011665A (en) Super Multiply Add (Super MADD) Instruction
US20160328233A1 (en) Packed finite impulse response (fir) filter processors, methods, systems, and instructions
US20190102198A1 (en) Systems, apparatuses, and methods for multiplication and accumulation of vector packed signed values
US10545757B2 (en) Instruction for determining equality of all packed data elements in a source operand
US20140189322A1 (en) Systems, Apparatuses, and Methods for Masking Usage Counting
CN109960486B (en) Binary data processing method, and apparatus, medium, and system thereof
CN111814093A (en) Multiply-accumulate instruction processing method and device
US9207941B2 (en) Systems, apparatuses, and methods for reducing the number of short integer multiplications
CN109416635B (en) Architecture register replacement for instructions using multiple architecture registers
US20190163476A1 (en) Systems, methods, and apparatuses handling half-precision operands
US11210091B2 (en) Method and apparatus for processing data splicing instruction
US11080230B2 (en) Hardware accelerators and methods for out-of-order processing
JP3534987B2 (en) Information processing equipment
US11263291B2 (en) Systems and methods for combining low-mantissa units to achieve and exceed FP64 emulation of matrix multiplication
EP4202651A1 (en) Apparatus and method for vector packed concatenate and shift of specific portions of quadwords
US20230098331A1 (en) Complex filter hardware accelerator for large data sets
US20230195417A1 (en) Parallel computation of a logic operation, increment, and decrement of any portion of a sum
US10579414B2 (en) Misprediction-triggered local history-based branch prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant