CN108804073B - Multi-flow real-time high-speed sequencing engine system - Google Patents
Multi-flow real-time high-speed sequencing engine system Download PDFInfo
- Publication number
- CN108804073B CN108804073B CN201810497800.7A CN201810497800A CN108804073B CN 108804073 B CN108804073 B CN 108804073B CN 201810497800 A CN201810497800 A CN 201810497800A CN 108804073 B CN108804073 B CN 108804073B
- Authority
- CN
- China
- Prior art keywords
- data
- output
- sequencing
- unit
- basic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/02—Comparing digital values
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30021—Compare instructions, e.g. Greater-Than, Equal-To, MINMAX
Abstract
The invention relates to a hardware implementation method of a multi-flow real-time high-speed sequencing engine, which comprises the following steps: the basic sequencing unit is used for realizing real-time sequencing through a production line by using a bubble sequencing algorithm; a data selection unit for calculating an address selection signal for selecting data to be outputaddr(ii) a A data output unit for selecting the data according to the addressaddrThe data output unit is provided with two paths of outputs which are respectively as follows: and the serial output outputs the minimum number, and the class memory outputs the well-ordered sequence. Has the advantages that: the method of the invention can output two kinds of output of a serial memory and a class memory aiming at different application scenes, and the output mode of the class memory not only can directly search all data, but also can save IO resources; and in the design of the data selection unit, data multiplexing is realized by using a domino logic technology, so that the resource utilization rate is effectively improved.
Description
Technical Field
The invention belongs to the field of high-speed sequencing engines, and particularly relates to a multi-flow real-time high-speed sequencing engine system.
Background
The sorting algorithm is a classical and commonly used algorithm, and aims to sort an unordered sequence into an ordered sequence through multiple comparisons. With the rapid development of computer technology, ranking has gradually become a basic algorithm in the current programming. In the internet era, the real-time requirement of modern applications on sequencing is also increasing. In current generation operating systems, the running time spent by the CPU on the sequencing takes a significant weight. Statistically, operations associated with sorting account for 25% -50% of all computer jobs, particularly in the commercial world where computers are in batch systems where 15% -70% of the time is in the CPU performing the sorting task. The sorting not only has important practical significance, but also has higher complexity and difficulty because the sorting relates to a large amount of data operation.
The sorting is an algorithm with higher complexity, more time consumption and frequent use, the proportion of the sorting in computer operation is equivalent to that of basic arithmetic and Boolean operation, but the efficiency of software sorting is low for sorting. Therefore, hardware specially responsible for processing sequencing can be designed in the processor, and sequencing speed and efficiency can be greatly improved.
Bubble sorting is currently the most common in hardware implementation sorting implementations, but there are many duplicate comparison operations for bubble sorting. To solve this problem, a serial parity ordering algorithm ensues. The core idea of parity ordering is to scan the array twice. In the first scan, all odd term data pairs, a [ i ] and a [ i +1], are picked out, where i is odd (i ═ 2k +1, k is a natural number). For each pair of data, they are swapped if their values are in reverse order of magnitude to the requirements; in the second scan, all even term data pairs, a [ i ] and a [ i +1], where i is an even number (i ═ 2 k; k is a natural number) are picked up and the same operation is performed. The above two scans are repeated until the array becomes an ordered sequence. For the present design, however, a bubble sort algorithm is employed because it is directed to the sequence of serial inputs.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a multi-stream real-time high-speed sequencing engine system which effectively completes the real-time sequencing function and is provided with two output modes of a serial memory and a similar memory, and the system is specifically realized by the following technical scheme:
the multi-flow real-time high-speed sequencing engine system comprises:
the basic sequencing unit is used for realizing real-time sequencing through a production line by using a bubble sequencing algorithm;
a data selection unit for calculating an address selection signal addr for selecting data to be output;
the data output unit outputs data according to the address selection signal addr, and the data output unit is provided with two paths of outputs which are respectively: and the serial output outputs the minimum number, and the class memory outputs the well-ordered sequence.
The multi-flow real-time high-speed sequencing engine system is further designed in that the basic sequencing unit is provided with four outputs DL, DS, VL and VS, wherein DL stores a larger number, DS stores a smaller number, VL and VS mark whether the sequencing of the corresponding data is finished, and the basic sequencing unit stores all the output DL and DS into an array DpreIn the method, VL and VS are stored in an array flag, and for each Dpre[i]All have a signal pos i indicating its position in the sorted sequence]And i represents the number of data bits, pos [ i ]]Is represented by formula (1).
The multi-pipeline real-time high-speed sequencing engine system is further designed in that the data selection unit mainly comprises two summation generators which are respectively shown as a formula (2) and a formula (3),
in the formulas (2) and (3), flag [ i ] represents a flag signal, and sum [ i ] represents a summation result;
using domino logic technology, realizing data multiplexing according to the formula (4);
sum[i]=(sum[i-1]+flag[i])&z[i] (4)
if the position signal pos [ i ]]Corresponding to the address selection signal addr, then Dpre[i]Is output and D is represented by the array selpre[i]Whether selected as output, as shown in equation (5):
the multi-flow real-time high-speed sequencing engine system is further designed in that the summation generator is implemented by adopting two-way parallel design, so that the execution efficiency is improved.
The multi-pipeline real-time high-speed sequencing engine system is further designed in that each stage of pipeline is one basic sequencing unit, each basic sequencing unit comprises four registers, one comparator and a state controller, and the four registers are respectively: rega, regac, regb and regbc, wherein the rega stores data a needing to be sorted, and the regac is a counter of the data a; regb stores data b needing to be sequenced, regbc is a counter of the data b, a comparator compares the size of the data a with the size of the data b, the comparator has two paths of outputs ab and bb, a is larger than b when the output signal ab is 1, b is larger than a when the output signal bb is 1, and a is equal to b when the output signals ab and bb are both 0; if a and b are equal, the latest received data is reserved, and the other data is output; the state controller is respectively connected with the rega and the regb to form a state machine, and the state machine generates signals representing the magnitude relation of the data in the basic sequencing unit.
The multi-pipeline real-time high-speed sequencing engine system is further designed in that the comparator adopts a priority coding technology and a layering mechanism.
The invention has the following advantages:
the multi-stream real-time high-speed sequencing engine system provided by the invention adopts a multi-stage pipeline design and a bubble sequencing algorithm, and realizes the real-time sequencing of the input sequence.
The design of the state machine of the basic sequencing unit in the method simplifies the circuit structure and improves the execution efficiency.
The comparator in the method adopts hierarchical structure design and a priority coding technology, so that the structure is easier to expand, and the calculation speed is higher.
The method realizes result multiplexing by using a domino logic technology in the data selection unit, and effectively saves computing resources and time.
Drawings
FIG. 1 is a schematic diagram of the overall architecture of a real-time ranking engine.
FIG. 2 is a schematic diagram of a data selection module architecture.
FIG. 3 is a basic sequencing Unit architecture.
Fig. 4 is a schematic diagram of the comparator structure.
Detailed Description
The method of the present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, the multi-pipeline real-time high-speed sequencing engine system of the embodiment of the present invention uses a 16-level pipeline structure and a bubble sequencing algorithm to implement the sequencing engine. In this embodiment, each stage of the pipeline is a Basic Sequencing Unit (BSU), receives data from an input terminal Bin, compares the data with data in the original BSU, and retains a larger number, the smaller number is output to the next stage of pipeline through an output terminal Bout, after 16 stages of pipeline, the maximum 16 numbers will be retained, and the minimum number will be output. Each data entering the engine will have a count flag indicating the arrival time of the data, the newly entered data count is b10000, and 1 is added every clock cycle until the count is b00000, which indicates that the data has stayed in the engine for sixteen cycles and needs to be extracted to the output port. For each BSU, there are two output ports DL, DS to hold the sequencing result, and there are flag bits VL, VS indicating whether DL, DS have been in the engine for sixteen cycles, respectively. Storing DL and DS output by all BSUs into an array DpreIn the method, VL and VS are stored in an array flag, and for each Dpre[i]There is a signal pos [ i ] indicating its position in the sorted sequence]。
The output data address selection signal addr is used to select the data to be output if flag [ i [ ]]Is 1, i.e., indicates that the corresponding data has been in the engine for sixteen cycles, and pos [ i [ ] i]Is equal to addr, Dpre[i]Is output and D is represented by the array selpre[i]If it is selected as output, then:
as shown in FIG. 2, the hardware implementation of the data selection unit is based on equation (1), since in the 32-bit flag signal, 16 bits are 1, and the other 16 bits are 0, and pos [ i ] is greater than or equal to pos [ j ] when i > j. Then pos [ i ] can also be expressed as:
the summation result is represented by sum [ i ], and further derived:
since the flag [31-i ] is calculated in the formula (5) and the result is the same as that of the flag [ i ] calculated in the formula (4), the hardware units for two-way operation are also the same and are called Sum generators (Sum generators), so that two-way parallel structure calculation Sum can be designed. It is further derived that the flag feature is represented using a signal z [ i ], which is 0 when all of the flags [0] to [ i ] are 0. Thus, there are:
z[i]=z[i-1]|flag[i] (6)
for sum [ i ], there are:
sum[i]=(sum[i-1]+flag[i])&z[i] (7)
in this embodiment, sum [ i ] is calculated through sum [ i-1], and the calculation efficiency can be improved and the resource waste can be reduced through a result reuse design of Domino Logic technology (MODL).
As shown in FIG. 3, each BSU contains four registers, rega, regac, regb, and regbc, respectively. The rega stores data a, and the regac is a counter of the data a; regb stores b data, and regbc is a counter of b data. In the BSU design, the status signal st is used to represent two states of the BSU: st is 0, meaning data a is smaller than data b, data a will be output, and the newly entered number will be stored in the rega register. st is 1 meaning that data b is less than a, data b will be output, and the newly entered number will be stored in the regb register. In the BSU, a comparator is used for comparing the size of data a and the size of data b, the comparator has two paths of outputs ab and bb, a is larger than b when the output signal ab is 1, b is larger than a when the output signal bb is 1, and a is equal to b when both the output signals ab and bb are 0. If a, b are equal, the latest received data will be retained and the other data will be output. Since the previous state of st indicates whether the last data sent is from a or b, i.e. whether the latest data received is a or b, st can pass through stpreTo calculate:
as can be seen from the analysis of equation (8), st for each cycle can be calculated using a state machine consisting of a register and feedback circuitry.
The data length used in this embodiment is 20 bits, where the upper 8 bits are 12 lower bits, the upper bits are denoted by fd, and the lower bits are denoted by sd. When comparing two data, if fd is larger, the number must be larger; when fd is the same, if sd is larger, the number is larger. The hierarchical comparator of the present embodiment, referring to fig. 4, uses five four-bit basic comparators at the first layer to compare the four-bit data of the data a and b respectively to generate two five-bit data, and uses one five-bit basic comparator at the second layer to generate two signals ab and bb to indicate the numberAccording to the sizes of a and b. For each elementary comparator, a preferential coding technique is used, if the two data a, b are compared, they are calculated firstThen, x is encoded preferentially and y is generated, y reserves the bit 1 with the highest bit of x being 1, and all other bits are set to zero, and two paths of outputs are as follows:
ab=|(a&y) (9)
bb=|(b&y) (10)
the above-described preferential encoding technique makes the comparator calculation faster.
After the synthesis, layout and wiring under the 40nm CMOS process, the critical path is 1.34ns, the dominant frequency reaches 750MHz, the total area is 11389.59 square microns, and certain performance advantages are realized compared with the existing hardware sequencing accelerator design.
In the embodiment, an effective hardware implementation method of the real-time sequencing engine is designed, and a bubble sequencing algorithm is used, so that continuous data are received in each clock cycle, and the current maximum 16 data are sequenced. And aiming at different application scenes, two outputs of a serial memory and a class memory are provided, and the output mode of the class memory can directly search all data and save IO resources; according to the invention, a 16-stage production line is used in the design of the basic sequencing unit to improve the working frequency and realize real-time sequencing, and the design of a state machine in the basic sequencing unit also improves the execution efficiency; in the design of the data selection unit, data multiplexing is realized by using a domino logic technology, so that the resource utilization rate is effectively improved, and the data position is calculated by adopting a two-way parallel structure through formula derivation, so that the calculation efficiency is improved; the comparator in the basic sequencing unit uses a priority coding algorithm, so that the comparator forms a two-stage hierarchical structure, and the operation is faster.
The multi-stream real-time high-speed sequencing engine system provided by the invention is introduced in detail so as to facilitate understanding of the invention and the core idea thereof. For a person skilled in the art, many modifications and deductions can be made in the concrete implementation according to the core idea of the invention. In view of the above, this description should not be taken in a limiting sense.
Claims (5)
1. A multi-flow real-time high-speed sequencing engine system, comprising:
the basic sequencing unit is used for realizing real-time sequencing through a production line by using a bubble sequencing algorithm;
a data selection unit for calculating an address selection signal addr for selecting data to be output;
the data output unit outputs data according to the address selection signal addr, and the data output unit is provided with two paths of outputs which are respectively: the basic sorting unit has four outputs DL, DS, VL and VS, wherein DL stores larger number, DS stores smaller number, VL and VS mark whether the sorting of the corresponding data is completed, and the basic sorting unit stores all the output DL and DS into array DpreIn the method, VL and VS are stored in an array flag, and for each Dpre[i]There is a signal pos [ i ] indicating its position in the sorted sequence]And i represents the number of data bits, pos [ i ]]Is represented by formula (1);
each stage of the assembly line is a basic sorting unit, data is received from an input end Bin, and compared with the data in the original basic sorting unit, a larger number is reserved, the smaller number is output to the next stage of assembly line through an output end Bout, after 16 stages of assembly line, the maximum 16 numbers are reserved, and the minimum number is output.
2. The multi-stream real-time high-speed sequencing engine system of claim 1, wherein: the data selection unit mainly comprises two summation generators which are respectively shown as a formula (2) and a formula (3),
in the formulas (2) and (3), flag [ x ] represents a flag signal, and sum [ i ] represents a summation result;
using domino logic technology, realizing data multiplexing according to the formula (4);
sum[i]=(sum[i-1]+flag[i])&z[i] (4)
if the position signal pos [ i ]]Corresponding to the address selection signal addr, then Dpre[i]Is output and D is represented by the array selpre[i]Whether selected as output, as shown in equation (5):
3. the multi-stream real-time high-speed sequencing engine system of claim 2, wherein: the summation generator is realized by adopting two-path parallel design, so that the execution efficiency is improved.
4. The multi-stream real-time high-speed sequencing engine system of claim 1, wherein: each stage of pipeline is one basic sequencing unit, the basic sequencing unit comprises four registers, a comparator and a state controller, and the four registers are respectively: rega, regac, regb and regbc, wherein the rega stores data a needing to be sorted, and the regac is a counter of the data a; regb stores data b needing to be sequenced, regbc is a counter of the data b, a comparator compares the size of the data a with the size of the data b, the comparator has two paths of outputs ab and bb, a is larger than b when the output signal ab is 1, b is larger than a when the output signal bb is 1, and a is equal to b when the output signals ab and bb are both 0; if a and b are equal, the latest received data is reserved, and the other data is output; the state controller is respectively connected with the rega and the regb to form a state machine, and the state machine generates signals representing the magnitude relation of the data in the basic sequencing unit.
5. The multi-stream real-time high-speed sequencing engine system of claim 4, wherein: the comparator employs a preferential encoding technique and a hierarchical structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810497800.7A CN108804073B (en) | 2018-05-21 | 2018-05-21 | Multi-flow real-time high-speed sequencing engine system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810497800.7A CN108804073B (en) | 2018-05-21 | 2018-05-21 | Multi-flow real-time high-speed sequencing engine system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108804073A CN108804073A (en) | 2018-11-13 |
CN108804073B true CN108804073B (en) | 2021-12-17 |
Family
ID=64092832
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810497800.7A Active CN108804073B (en) | 2018-05-21 | 2018-05-21 | Multi-flow real-time high-speed sequencing engine system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108804073B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4414643A (en) * | 1981-05-15 | 1983-11-08 | The Singer Company | Ordering system for pairing feature intersections on a simulated radar sweepline |
CN101192847A (en) * | 2007-08-13 | 2008-06-04 | 中兴通讯股份有限公司 | A peak search and sorting device and peak sorting method |
CN102073620A (en) * | 2009-11-20 | 2011-05-25 | 扬智电子(上海)有限公司 | Fast Fourier converter, reverse fast Fourier converter and reverse fast method thereof |
CN103969635A (en) * | 2014-04-30 | 2014-08-06 | 上海航天电子通讯设备研究所 | Meteorologic signal processing IP core of low-altitude monitoring radar and real-time data sorting method thereof |
CN104317549A (en) * | 2014-10-15 | 2015-01-28 | 中国航天科技集团公司第九研究院第七七一研究所 | Cascade structure circuit and method for realizing data sorting |
CN104866286A (en) * | 2015-06-02 | 2015-08-26 | 电子科技大学 | OpenCL and SoC-FPGA-Based K neighbor sorting accelerating method |
CN106462386A (en) * | 2014-05-30 | 2017-02-22 | 华为技术有限公司 | Parallel mergesorting |
CN106775573A (en) * | 2016-11-23 | 2017-05-31 | 北京电子工程总体研究所 | A kind of potential target sort method based on FPGA |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002229772A (en) * | 2001-02-06 | 2002-08-16 | Sony Corp | Sort processing method and sort processor |
-
2018
- 2018-05-21 CN CN201810497800.7A patent/CN108804073B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4414643A (en) * | 1981-05-15 | 1983-11-08 | The Singer Company | Ordering system for pairing feature intersections on a simulated radar sweepline |
CN101192847A (en) * | 2007-08-13 | 2008-06-04 | 中兴通讯股份有限公司 | A peak search and sorting device and peak sorting method |
CN102073620A (en) * | 2009-11-20 | 2011-05-25 | 扬智电子(上海)有限公司 | Fast Fourier converter, reverse fast Fourier converter and reverse fast method thereof |
CN103969635A (en) * | 2014-04-30 | 2014-08-06 | 上海航天电子通讯设备研究所 | Meteorologic signal processing IP core of low-altitude monitoring radar and real-time data sorting method thereof |
CN106462386A (en) * | 2014-05-30 | 2017-02-22 | 华为技术有限公司 | Parallel mergesorting |
CN104317549A (en) * | 2014-10-15 | 2015-01-28 | 中国航天科技集团公司第九研究院第七七一研究所 | Cascade structure circuit and method for realizing data sorting |
CN104866286A (en) * | 2015-06-02 | 2015-08-26 | 电子科技大学 | OpenCL and SoC-FPGA-Based K neighbor sorting accelerating method |
CN106775573A (en) * | 2016-11-23 | 2017-05-31 | 北京电子工程总体研究所 | A kind of potential target sort method based on FPGA |
Non-Patent Citations (2)
Title |
---|
"Digit-Serial Pipeline Sorter Architecture";Yun-Nan Chang;《Journal or Signal Processing Systems》;20091230(第61期);第241-249页 * |
"基于4种并行模式的快速排序算法";张天阳,陈华;《成都信息工程大学学报》;20180228;第33卷(第1期);第13-17页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108804073A (en) | 2018-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017092283A1 (en) | Data accumulation apparatus and method, and digital signal processing device | |
WO2017088455A1 (en) | Data ranking apparatus and method implemented by hardware, and data processing chip | |
Geng et al. | O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference | |
Norollah et al. | RTHS: A low-cost high-performance real-time hardware sorter, using a multidimensional sorting algorithm | |
CN101042640A (en) | Digital signal processor with bit expansion and bit compressing compressing cell | |
CN104317549A (en) | Cascade structure circuit and method for realizing data sorting | |
CN105573843A (en) | Data processing method and system | |
CN108804073B (en) | Multi-flow real-time high-speed sequencing engine system | |
Wang et al. | FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications | |
CN1564125A (en) | Array type reconstructural DSP engine chip structure based on CORDIC unit | |
CN111258541B (en) | Multiplier, data processing method, chip and electronic equipment | |
CN209895329U (en) | Multiplier and method for generating a digital signal | |
TWI617987B (en) | Method, computer system, and non-transitory computer readable memory for implementing a line speed interconnect structure | |
Hayashi et al. | An FPGA-based In-NIC cache approach for lazy learning outlier filtering | |
CN108108151A (en) | The arithmetic logic unit operation method and system of superconduction list flux quantum processor | |
Li et al. | An extended nonstrict partially ordered set-based configurable linear sorter on FPGAs | |
Norollah et al. | An efficient sorting architecture for area and energy constrained edge computing devices | |
CN1246770C (en) | Digital signal processor with modulus address arithmetic | |
Geurts et al. | Heuristic techniques for the synthesis of complex functional units | |
Norollah et al. | A New Hardware Accelerator for Data Sorting in Area & Energy Constrained Architectures | |
Yu et al. | Accelerated Synchronous Model Parallelism Using Cooperative Process for Training Compute-Intensive Models | |
Maurya et al. | An approach to parallel sorting using ternary search | |
Eshaghian-Wilner et al. | The systolic reconfigurable mesh | |
WO2022134873A1 (en) | Data processing device, data processing method, and related product | |
CN111258634B (en) | Data selection device, data processing method, chip and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |