CN108804073B - Multi-flow real-time high-speed sequencing engine system - Google Patents

Multi-flow real-time high-speed sequencing engine system Download PDF

Info

Publication number
CN108804073B
CN108804073B CN201810497800.7A CN201810497800A CN108804073B CN 108804073 B CN108804073 B CN 108804073B CN 201810497800 A CN201810497800 A CN 201810497800A CN 108804073 B CN108804073 B CN 108804073B
Authority
CN
China
Prior art keywords
data
output
sequencing
unit
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810497800.7A
Other languages
Chinese (zh)
Other versions
CN108804073A (en
Inventor
李丽
樊朝煜
刘禹楠
傅玉祥
何书专
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810497800.7A priority Critical patent/CN108804073B/en
Publication of CN108804073A publication Critical patent/CN108804073A/en
Application granted granted Critical
Publication of CN108804073B publication Critical patent/CN108804073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/02Comparing digital values
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30021Compare instructions, e.g. Greater-Than, Equal-To, MINMAX

Abstract

The invention relates to a hardware implementation method of a multi-flow real-time high-speed sequencing engine, which comprises the following steps: the basic sequencing unit is used for realizing real-time sequencing through a production line by using a bubble sequencing algorithm; a data selection unit for calculating an address selection signal for selecting data to be outputaddr(ii) a A data output unit for selecting the data according to the addressaddrThe data output unit is provided with two paths of outputs which are respectively as follows: and the serial output outputs the minimum number, and the class memory outputs the well-ordered sequence. Has the advantages that: the method of the invention can output two kinds of output of a serial memory and a class memory aiming at different application scenes, and the output mode of the class memory not only can directly search all data, but also can save IO resources; and in the design of the data selection unit, data multiplexing is realized by using a domino logic technology, so that the resource utilization rate is effectively improved.

Description

Multi-flow real-time high-speed sequencing engine system
Technical Field
The invention belongs to the field of high-speed sequencing engines, and particularly relates to a multi-flow real-time high-speed sequencing engine system.
Background
The sorting algorithm is a classical and commonly used algorithm, and aims to sort an unordered sequence into an ordered sequence through multiple comparisons. With the rapid development of computer technology, ranking has gradually become a basic algorithm in the current programming. In the internet era, the real-time requirement of modern applications on sequencing is also increasing. In current generation operating systems, the running time spent by the CPU on the sequencing takes a significant weight. Statistically, operations associated with sorting account for 25% -50% of all computer jobs, particularly in the commercial world where computers are in batch systems where 15% -70% of the time is in the CPU performing the sorting task. The sorting not only has important practical significance, but also has higher complexity and difficulty because the sorting relates to a large amount of data operation.
The sorting is an algorithm with higher complexity, more time consumption and frequent use, the proportion of the sorting in computer operation is equivalent to that of basic arithmetic and Boolean operation, but the efficiency of software sorting is low for sorting. Therefore, hardware specially responsible for processing sequencing can be designed in the processor, and sequencing speed and efficiency can be greatly improved.
Bubble sorting is currently the most common in hardware implementation sorting implementations, but there are many duplicate comparison operations for bubble sorting. To solve this problem, a serial parity ordering algorithm ensues. The core idea of parity ordering is to scan the array twice. In the first scan, all odd term data pairs, a [ i ] and a [ i +1], are picked out, where i is odd (i ═ 2k +1, k is a natural number). For each pair of data, they are swapped if their values are in reverse order of magnitude to the requirements; in the second scan, all even term data pairs, a [ i ] and a [ i +1], where i is an even number (i ═ 2 k; k is a natural number) are picked up and the same operation is performed. The above two scans are repeated until the array becomes an ordered sequence. For the present design, however, a bubble sort algorithm is employed because it is directed to the sequence of serial inputs.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a multi-stream real-time high-speed sequencing engine system which effectively completes the real-time sequencing function and is provided with two output modes of a serial memory and a similar memory, and the system is specifically realized by the following technical scheme:
the multi-flow real-time high-speed sequencing engine system comprises:
the basic sequencing unit is used for realizing real-time sequencing through a production line by using a bubble sequencing algorithm;
a data selection unit for calculating an address selection signal addr for selecting data to be output;
the data output unit outputs data according to the address selection signal addr, and the data output unit is provided with two paths of outputs which are respectively: and the serial output outputs the minimum number, and the class memory outputs the well-ordered sequence.
The multi-flow real-time high-speed sequencing engine system is further designed in that the basic sequencing unit is provided with four outputs DL, DS, VL and VS, wherein DL stores a larger number, DS stores a smaller number, VL and VS mark whether the sequencing of the corresponding data is finished, and the basic sequencing unit stores all the output DL and DS into an array DpreIn the method, VL and VS are stored in an array flag, and for each Dpre[i]All have a signal pos i indicating its position in the sorted sequence]And i represents the number of data bits, pos [ i ]]Is represented by formula (1).
Figure BDA0001667467220000021
The multi-pipeline real-time high-speed sequencing engine system is further designed in that the data selection unit mainly comprises two summation generators which are respectively shown as a formula (2) and a formula (3),
Figure BDA0001667467220000022
Figure BDA0001667467220000023
in the formulas (2) and (3), flag [ i ] represents a flag signal, and sum [ i ] represents a summation result;
using domino logic technology, realizing data multiplexing according to the formula (4);
sum[i]=(sum[i-1]+flag[i])&z[i] (4)
if the position signal pos [ i ]]Corresponding to the address selection signal addr, then Dpre[i]Is output and D is represented by the array selpre[i]Whether selected as output, as shown in equation (5):
Figure BDA0001667467220000024
the multi-flow real-time high-speed sequencing engine system is further designed in that the summation generator is implemented by adopting two-way parallel design, so that the execution efficiency is improved.
The multi-pipeline real-time high-speed sequencing engine system is further designed in that each stage of pipeline is one basic sequencing unit, each basic sequencing unit comprises four registers, one comparator and a state controller, and the four registers are respectively: rega, regac, regb and regbc, wherein the rega stores data a needing to be sorted, and the regac is a counter of the data a; regb stores data b needing to be sequenced, regbc is a counter of the data b, a comparator compares the size of the data a with the size of the data b, the comparator has two paths of outputs ab and bb, a is larger than b when the output signal ab is 1, b is larger than a when the output signal bb is 1, and a is equal to b when the output signals ab and bb are both 0; if a and b are equal, the latest received data is reserved, and the other data is output; the state controller is respectively connected with the rega and the regb to form a state machine, and the state machine generates signals representing the magnitude relation of the data in the basic sequencing unit.
The multi-pipeline real-time high-speed sequencing engine system is further designed in that the comparator adopts a priority coding technology and a layering mechanism.
The invention has the following advantages:
the multi-stream real-time high-speed sequencing engine system provided by the invention adopts a multi-stage pipeline design and a bubble sequencing algorithm, and realizes the real-time sequencing of the input sequence.
The design of the state machine of the basic sequencing unit in the method simplifies the circuit structure and improves the execution efficiency.
The comparator in the method adopts hierarchical structure design and a priority coding technology, so that the structure is easier to expand, and the calculation speed is higher.
The method realizes result multiplexing by using a domino logic technology in the data selection unit, and effectively saves computing resources and time.
Drawings
FIG. 1 is a schematic diagram of the overall architecture of a real-time ranking engine.
FIG. 2 is a schematic diagram of a data selection module architecture.
FIG. 3 is a basic sequencing Unit architecture.
Fig. 4 is a schematic diagram of the comparator structure.
Detailed Description
The method of the present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, the multi-pipeline real-time high-speed sequencing engine system of the embodiment of the present invention uses a 16-level pipeline structure and a bubble sequencing algorithm to implement the sequencing engine. In this embodiment, each stage of the pipeline is a Basic Sequencing Unit (BSU), receives data from an input terminal Bin, compares the data with data in the original BSU, and retains a larger number, the smaller number is output to the next stage of pipeline through an output terminal Bout, after 16 stages of pipeline, the maximum 16 numbers will be retained, and the minimum number will be output. Each data entering the engine will have a count flag indicating the arrival time of the data, the newly entered data count is b10000, and 1 is added every clock cycle until the count is b00000, which indicates that the data has stayed in the engine for sixteen cycles and needs to be extracted to the output port. For each BSU, there are two output ports DL, DS to hold the sequencing result, and there are flag bits VL, VS indicating whether DL, DS have been in the engine for sixteen cycles, respectively. Storing DL and DS output by all BSUs into an array DpreIn the method, VL and VS are stored in an array flag, and for each Dpre[i]There is a signal pos [ i ] indicating its position in the sorted sequence]。
Figure BDA0001667467220000041
The output data address selection signal addr is used to select the data to be output if flag [ i [ ]]Is 1, i.e., indicates that the corresponding data has been in the engine for sixteen cycles, and pos [ i [ ] i]Is equal to addr, Dpre[i]Is output and D is represented by the array selpre[i]If it is selected as output, then:
Figure BDA0001667467220000042
as shown in FIG. 2, the hardware implementation of the data selection unit is based on equation (1), since in the 32-bit flag signal, 16 bits are 1, and the other 16 bits are 0, and pos [ i ] is greater than or equal to pos [ j ] when i > j. Then pos [ i ] can also be expressed as:
Figure BDA0001667467220000043
the summation result is represented by sum [ i ], and further derived:
Figure BDA0001667467220000044
Figure BDA0001667467220000045
since the flag [31-i ] is calculated in the formula (5) and the result is the same as that of the flag [ i ] calculated in the formula (4), the hardware units for two-way operation are also the same and are called Sum generators (Sum generators), so that two-way parallel structure calculation Sum can be designed. It is further derived that the flag feature is represented using a signal z [ i ], which is 0 when all of the flags [0] to [ i ] are 0. Thus, there are:
z[i]=z[i-1]|flag[i] (6)
for sum [ i ], there are:
sum[i]=(sum[i-1]+flag[i])&z[i] (7)
in this embodiment, sum [ i ] is calculated through sum [ i-1], and the calculation efficiency can be improved and the resource waste can be reduced through a result reuse design of Domino Logic technology (MODL).
As shown in FIG. 3, each BSU contains four registers, rega, regac, regb, and regbc, respectively. The rega stores data a, and the regac is a counter of the data a; regb stores b data, and regbc is a counter of b data. In the BSU design, the status signal st is used to represent two states of the BSU: st is 0, meaning data a is smaller than data b, data a will be output, and the newly entered number will be stored in the rega register. st is 1 meaning that data b is less than a, data b will be output, and the newly entered number will be stored in the regb register. In the BSU, a comparator is used for comparing the size of data a and the size of data b, the comparator has two paths of outputs ab and bb, a is larger than b when the output signal ab is 1, b is larger than a when the output signal bb is 1, and a is equal to b when both the output signals ab and bb are 0. If a, b are equal, the latest received data will be retained and the other data will be output. Since the previous state of st indicates whether the last data sent is from a or b, i.e. whether the latest data received is a or b, st can pass through stpreTo calculate:
Figure BDA0001667467220000051
as can be seen from the analysis of equation (8), st for each cycle can be calculated using a state machine consisting of a register and feedback circuitry.
The data length used in this embodiment is 20 bits, where the upper 8 bits are 12 lower bits, the upper bits are denoted by fd, and the lower bits are denoted by sd. When comparing two data, if fd is larger, the number must be larger; when fd is the same, if sd is larger, the number is larger. The hierarchical comparator of the present embodiment, referring to fig. 4, uses five four-bit basic comparators at the first layer to compare the four-bit data of the data a and b respectively to generate two five-bit data, and uses one five-bit basic comparator at the second layer to generate two signals ab and bb to indicate the numberAccording to the sizes of a and b. For each elementary comparator, a preferential coding technique is used, if the two data a, b are compared, they are calculated first
Figure BDA0001667467220000052
Then, x is encoded preferentially and y is generated, y reserves the bit 1 with the highest bit of x being 1, and all other bits are set to zero, and two paths of outputs are as follows:
ab=|(a&y) (9)
bb=|(b&y) (10)
the above-described preferential encoding technique makes the comparator calculation faster.
After the synthesis, layout and wiring under the 40nm CMOS process, the critical path is 1.34ns, the dominant frequency reaches 750MHz, the total area is 11389.59 square microns, and certain performance advantages are realized compared with the existing hardware sequencing accelerator design.
In the embodiment, an effective hardware implementation method of the real-time sequencing engine is designed, and a bubble sequencing algorithm is used, so that continuous data are received in each clock cycle, and the current maximum 16 data are sequenced. And aiming at different application scenes, two outputs of a serial memory and a class memory are provided, and the output mode of the class memory can directly search all data and save IO resources; according to the invention, a 16-stage production line is used in the design of the basic sequencing unit to improve the working frequency and realize real-time sequencing, and the design of a state machine in the basic sequencing unit also improves the execution efficiency; in the design of the data selection unit, data multiplexing is realized by using a domino logic technology, so that the resource utilization rate is effectively improved, and the data position is calculated by adopting a two-way parallel structure through formula derivation, so that the calculation efficiency is improved; the comparator in the basic sequencing unit uses a priority coding algorithm, so that the comparator forms a two-stage hierarchical structure, and the operation is faster.
The multi-stream real-time high-speed sequencing engine system provided by the invention is introduced in detail so as to facilitate understanding of the invention and the core idea thereof. For a person skilled in the art, many modifications and deductions can be made in the concrete implementation according to the core idea of the invention. In view of the above, this description should not be taken in a limiting sense.

Claims (5)

1. A multi-flow real-time high-speed sequencing engine system, comprising:
the basic sequencing unit is used for realizing real-time sequencing through a production line by using a bubble sequencing algorithm;
a data selection unit for calculating an address selection signal addr for selecting data to be output;
the data output unit outputs data according to the address selection signal addr, and the data output unit is provided with two paths of outputs which are respectively: the basic sorting unit has four outputs DL, DS, VL and VS, wherein DL stores larger number, DS stores smaller number, VL and VS mark whether the sorting of the corresponding data is completed, and the basic sorting unit stores all the output DL and DS into array DpreIn the method, VL and VS are stored in an array flag, and for each Dpre[i]There is a signal pos [ i ] indicating its position in the sorted sequence]And i represents the number of data bits, pos [ i ]]Is represented by formula (1);
Figure FDA0003328452760000011
each stage of the assembly line is a basic sorting unit, data is received from an input end Bin, and compared with the data in the original basic sorting unit, a larger number is reserved, the smaller number is output to the next stage of assembly line through an output end Bout, after 16 stages of assembly line, the maximum 16 numbers are reserved, and the minimum number is output.
2. The multi-stream real-time high-speed sequencing engine system of claim 1, wherein: the data selection unit mainly comprises two summation generators which are respectively shown as a formula (2) and a formula (3),
Figure FDA0003328452760000012
Figure FDA0003328452760000013
in the formulas (2) and (3), flag [ x ] represents a flag signal, and sum [ i ] represents a summation result;
using domino logic technology, realizing data multiplexing according to the formula (4);
sum[i]=(sum[i-1]+flag[i])&z[i] (4)
if the position signal pos [ i ]]Corresponding to the address selection signal addr, then Dpre[i]Is output and D is represented by the array selpre[i]Whether selected as output, as shown in equation (5):
Figure FDA0003328452760000014
3. the multi-stream real-time high-speed sequencing engine system of claim 2, wherein: the summation generator is realized by adopting two-path parallel design, so that the execution efficiency is improved.
4. The multi-stream real-time high-speed sequencing engine system of claim 1, wherein: each stage of pipeline is one basic sequencing unit, the basic sequencing unit comprises four registers, a comparator and a state controller, and the four registers are respectively: rega, regac, regb and regbc, wherein the rega stores data a needing to be sorted, and the regac is a counter of the data a; regb stores data b needing to be sequenced, regbc is a counter of the data b, a comparator compares the size of the data a with the size of the data b, the comparator has two paths of outputs ab and bb, a is larger than b when the output signal ab is 1, b is larger than a when the output signal bb is 1, and a is equal to b when the output signals ab and bb are both 0; if a and b are equal, the latest received data is reserved, and the other data is output; the state controller is respectively connected with the rega and the regb to form a state machine, and the state machine generates signals representing the magnitude relation of the data in the basic sequencing unit.
5. The multi-stream real-time high-speed sequencing engine system of claim 4, wherein: the comparator employs a preferential encoding technique and a hierarchical structure.
CN201810497800.7A 2018-05-21 2018-05-21 Multi-flow real-time high-speed sequencing engine system Active CN108804073B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810497800.7A CN108804073B (en) 2018-05-21 2018-05-21 Multi-flow real-time high-speed sequencing engine system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810497800.7A CN108804073B (en) 2018-05-21 2018-05-21 Multi-flow real-time high-speed sequencing engine system

Publications (2)

Publication Number Publication Date
CN108804073A CN108804073A (en) 2018-11-13
CN108804073B true CN108804073B (en) 2021-12-17

Family

ID=64092832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810497800.7A Active CN108804073B (en) 2018-05-21 2018-05-21 Multi-flow real-time high-speed sequencing engine system

Country Status (1)

Country Link
CN (1) CN108804073B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4414643A (en) * 1981-05-15 1983-11-08 The Singer Company Ordering system for pairing feature intersections on a simulated radar sweepline
CN101192847A (en) * 2007-08-13 2008-06-04 中兴通讯股份有限公司 A peak search and sorting device and peak sorting method
CN102073620A (en) * 2009-11-20 2011-05-25 扬智电子(上海)有限公司 Fast Fourier converter, reverse fast Fourier converter and reverse fast method thereof
CN103969635A (en) * 2014-04-30 2014-08-06 上海航天电子通讯设备研究所 Meteorologic signal processing IP core of low-altitude monitoring radar and real-time data sorting method thereof
CN104317549A (en) * 2014-10-15 2015-01-28 中国航天科技集团公司第九研究院第七七一研究所 Cascade structure circuit and method for realizing data sorting
CN104866286A (en) * 2015-06-02 2015-08-26 电子科技大学 OpenCL and SoC-FPGA-Based K neighbor sorting accelerating method
CN106462386A (en) * 2014-05-30 2017-02-22 华为技术有限公司 Parallel mergesorting
CN106775573A (en) * 2016-11-23 2017-05-31 北京电子工程总体研究所 A kind of potential target sort method based on FPGA

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002229772A (en) * 2001-02-06 2002-08-16 Sony Corp Sort processing method and sort processor

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4414643A (en) * 1981-05-15 1983-11-08 The Singer Company Ordering system for pairing feature intersections on a simulated radar sweepline
CN101192847A (en) * 2007-08-13 2008-06-04 中兴通讯股份有限公司 A peak search and sorting device and peak sorting method
CN102073620A (en) * 2009-11-20 2011-05-25 扬智电子(上海)有限公司 Fast Fourier converter, reverse fast Fourier converter and reverse fast method thereof
CN103969635A (en) * 2014-04-30 2014-08-06 上海航天电子通讯设备研究所 Meteorologic signal processing IP core of low-altitude monitoring radar and real-time data sorting method thereof
CN106462386A (en) * 2014-05-30 2017-02-22 华为技术有限公司 Parallel mergesorting
CN104317549A (en) * 2014-10-15 2015-01-28 中国航天科技集团公司第九研究院第七七一研究所 Cascade structure circuit and method for realizing data sorting
CN104866286A (en) * 2015-06-02 2015-08-26 电子科技大学 OpenCL and SoC-FPGA-Based K neighbor sorting accelerating method
CN106775573A (en) * 2016-11-23 2017-05-31 北京电子工程总体研究所 A kind of potential target sort method based on FPGA

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Digit-Serial Pipeline Sorter Architecture";Yun-Nan Chang;《Journal or Signal Processing Systems》;20091230(第61期);第241-249页 *
"基于4种并行模式的快速排序算法";张天阳,陈华;《成都信息工程大学学报》;20180228;第33卷(第1期);第13-17页 *

Also Published As

Publication number Publication date
CN108804073A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
WO2017092283A1 (en) Data accumulation apparatus and method, and digital signal processing device
WO2017088455A1 (en) Data ranking apparatus and method implemented by hardware, and data processing chip
Geng et al. O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference
Norollah et al. RTHS: A low-cost high-performance real-time hardware sorter, using a multidimensional sorting algorithm
CN101042640A (en) Digital signal processor with bit expansion and bit compressing compressing cell
CN104317549A (en) Cascade structure circuit and method for realizing data sorting
CN105573843A (en) Data processing method and system
CN108804073B (en) Multi-flow real-time high-speed sequencing engine system
Wang et al. FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications
CN1564125A (en) Array type reconstructural DSP engine chip structure based on CORDIC unit
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN209895329U (en) Multiplier and method for generating a digital signal
TWI617987B (en) Method, computer system, and non-transitory computer readable memory for implementing a line speed interconnect structure
Hayashi et al. An FPGA-based In-NIC cache approach for lazy learning outlier filtering
CN108108151A (en) The arithmetic logic unit operation method and system of superconduction list flux quantum processor
Li et al. An extended nonstrict partially ordered set-based configurable linear sorter on FPGAs
Norollah et al. An efficient sorting architecture for area and energy constrained edge computing devices
CN1246770C (en) Digital signal processor with modulus address arithmetic
Geurts et al. Heuristic techniques for the synthesis of complex functional units
Norollah et al. A New Hardware Accelerator for Data Sorting in Area & Energy Constrained Architectures
Yu et al. Accelerated Synchronous Model Parallelism Using Cooperative Process for Training Compute-Intensive Models
Maurya et al. An approach to parallel sorting using ternary search
Eshaghian-Wilner et al. The systolic reconfigurable mesh
WO2022134873A1 (en) Data processing device, data processing method, and related product
CN111258634B (en) Data selection device, data processing method, chip and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant