CN116670639A - Processor, sorting method and electronic equipment - Google Patents

Processor, sorting method and electronic equipment Download PDF

Info

Publication number
CN116670639A
CN116670639A CN202180088003.3A CN202180088003A CN116670639A CN 116670639 A CN116670639 A CN 116670639A CN 202180088003 A CN202180088003 A CN 202180088003A CN 116670639 A CN116670639 A CN 116670639A
Authority
CN
China
Prior art keywords
sequences
read
data
ordered
buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180088003.3A
Other languages
Chinese (zh)
Inventor
杨升
刘虎
林强
杜幸芝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116670639A publication Critical patent/CN116670639A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/22Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
    • G06F7/24Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers sorting methods in general

Abstract

The application provides a processor, a sequencing method and electronic equipment, which can reduce the number of instructions used in the sequencing process so as to improve the sequencing efficiency. The processor includes: instruction storage circuitry, control circuitry, and sequencing circuitry. And a control circuit for reading the first instruction from the instruction storage circuit and sending the decoded first instruction to the sequencing circuit. And the sequencing circuit is used for responding to the decoded first instruction, reading M first sequences according to the storage address, and outputting an ordered sequence with the length of M times N if the M first sequences are all ordered sequences, or outputting M ordered sequences with the length of N if unordered sequences exist in the M first sequences. In this way, the ordering circuitry in the processor may order the plurality of sequences in response to the decoded first instruction. In other words, the processor can sort the sequences by executing 1 first instruction, so that a large number of repeated instructions can be avoided, the number and the execution time of the sorting instructions are reduced, and the sorting efficiency is improved.

Description

Processor, sorting method and electronic equipment Technical Field
The present application relates to the field of electronic technologies, and in particular, to a processor, a sorting method, and an electronic device.
Background
Ordering is a type of operation that is often performed in electronic devices such as computers, and can adjust an unordered sequence to an ordered sequence. Currently, various sorting algorithms may be implemented by a processor in an electronic device executing a computer program (e.g., a sorting program).
However, the above-mentioned sorting procedure usually includes a large number of repeated instructions, such as operation instructions, access instructions, control instructions, etc., which results in a large amount of time consumption for the processor and low sorting efficiency.
Disclosure of Invention
The embodiment of the application provides a processor, a sequencing method and electronic equipment, which can reduce the number of instructions used in the sequencing process so as to improve the sequencing efficiency.
In order to achieve the above purpose, the application adopts the following technical scheme:
in a first aspect, a processor is provided. The processor includes: the device comprises an instruction storage circuit, a control circuit and a sequencing circuit, wherein the instruction storage circuit is coupled with the control circuit, and the control circuit is coupled with the sequencing circuit. And a control circuit for reading the first instruction from the instruction storage circuit and decoding the first instruction. The decoded first instruction comprises M first sequences of storage addresses, wherein the length of each first sequence is N, M is an integer greater than 1, and N is an integer greater than 1. The control circuit is also used for sending the decoded first instruction to the sequencing circuit. Sequencing circuitry, responsive to the decoded first instruction, performs the steps of: and reading M first sequences according to the storage address, and outputting an ordered sequence with the length of M x N if the M first sequences are all ordered sequences, or outputting M ordered sequences with the length of N if unordered sequences exist in the M first sequences.
Based on the processor of the first aspect, the ordering circuit in the processor may be configured to arrange the M ordered sequences into one ordered sequence of length m×n or to arrange the M sequences with unordered sequences into M ordered sequences of length N in response to the decoded first instruction. In other words, the processor can sort the sequences by executing 1 first instruction, so that the processor can execute fewer instructions, namely can complete the sorting operation of the sequences, and can avoid executing a large number of repeated instructions, thereby reducing the number of the sorting instructions to be executed and the execution time, and further improving the sorting efficiency.
In a possible implementation manner, the sorting circuit in the first aspect may include: the device comprises a sequencing controller, a sequencer and M first buffers, wherein the sequencing controller is coupled with the M first buffers, and the M first buffers are coupled with the sequencer. And the sequencing controller is used for reading the M first sequences according to the storage addresses and storing the M first sequences in the M first buffers. Wherein each first buffer stores a first sequence. And the sequencer is used for reading out an ordered sequence with the length of M from the M first buffers if the M first sequences are all ordered sequences, or reading out M ordered sequences with the length of N from the M first buffers if unordered sequences exist in the M first sequences. The first buffer can be implemented by an internal buffer, such as a buffer, and the read-write speed is faster than that of the external memory (such as a global memory, etc.), and can be used as a first-level buffer between the external memory and the sequencer, so as to avoid the sequencer from reading data from the external memory, and reduce the time delay of the sequencer for reading data, thereby further improving the sequencing efficiency.
Optionally (hereinafter, simply referred to as scheme 1), the sequencer is further configured to read an ordered sequence with a length of m×n from the M first buffers according to the first reading rule if the M first sequences are all ordered sequences. Wherein, the first reading rule may be: the first data are read out from M first buffers each time. In other words, during the process of reading data from the M first buffers, the sequencer may arrange the M ordered sequences into one ordered sequence. Thus, the processor can sort the M ordered sequences by executing the first instruction for 1 time, so that the number and the execution time of sorting instructions to be executed can be reduced, and the sorting efficiency is improved.
Or, alternatively (hereinafter referred to simply as scheme 2), the sequencer is further configured to read M ordered sequences with a length of N from the M first buffers according to a second reading rule if there is an unordered sequence in the M first sequences. The second reading rule may be: an ordered sequence of length N is read out in M first buffers at a time. In other words, during the process of reading data from the M first buffers, the sequencer may arrange M sequences in which there are unordered sequences into M ordered sequences. Thus, the processor can sort M sequences with unordered sequences by executing the first instruction for 1 time, so that the number and the execution time of sorting instructions to be executed can be reduced, and the sorting efficiency is improved.
Scheme 1 and scheme 2 may be implemented independently or in combination. Scheme 1, implemented in combination with scheme 2, may include: scheme 2 is performed first, arranging the M unordered sequences into M ordered sequences. Scheme 1 is then performed, arranging the M ordered sequences into one ordered sequence. Thus, the processor can arrange the M sequences into 1 ordered sequence by executing the first instruction for 2 times, so that the processor can execute fewer instructions, namely the ordering operation of the sequences can be completed, a large number of repeated instructions can be avoided, the number of ordered instructions to be executed and the execution time can be reduced, and the ordering efficiency can be improved.
Optionally, the sorting circuit of the first aspect may further include: and the second buffer is coupled with the sequencing controller. And the sequencing controller is used for reading the M first sequences according to the storage address and storing the M first sequences in the second buffer. The sequencing controller is further configured to move the M first sequences from the second buffer to the M first buffers. The second buffer can be implemented by adopting a Unified Buffer (UB), a cache memory, a static memory or other memories, the read-write speed is faster than that of the external memory (such as a global memory), the second buffer can be used as a second-level buffer between the external memory and the sequencer, the first buffer is prevented from reading data from the external memory, the time delay of the first buffer for receiving the data can be further reduced, and the time delay of the sequencer for reading the data from the first buffer can be further reduced or even eliminated, so that the sequencing efficiency is further improved.
Further, the above-mentioned sequencing controller may include: the first read-write controller is coupled with the second read-write controller, the second buffer is respectively coupled with the first read-write controller and the second read-write controller, and the M first buffers are respectively coupled with the first read-write controller. And the second read-write controller is used for reading the M first sequences according to the storage address and storing the M first sequences in the second buffer. And the first read-write controller is used for moving the M first sequences from the second buffer to the M first buffers. Therefore, the first read-write controller, the second read-write controller and the second buffer can be combined to continuously receive data to be ordered and continuously send the data to be ordered to the M first buffers, so that the sequencer can continuously order the data to be ordered, and the ordering efficiency is further improved.
Still further, the first read-write controller is further configured to send the moved data amount information to the second read-write controller when the M first sequences are moved. Since the moved data amount information corresponds to the information of the size of the free storage space of the second buffer, the second read-write controller can move the data which is not moved to the second buffer in the M first sequences to the second buffer as soon as possible according to the moved data amount information. In this way, the sorting efficiency can be further improved.
Further, the sorting controller further comprises a third read-write controller, and the third read-write controller is respectively coupled with the sorting device and the second buffer. And the third read-write controller is used for moving the ordered sequence output by the sequencer to the second buffer. The second buffer can be realized by adopting a unified buffer, a high-speed buffer memory, a static memory or other memories, the read-write speed is faster than that of the external memory, the second buffer can be used as a second-level buffer between the external memory and the sequencer, and the third read-write controller can send the sequenced data to the second buffer, so that the sending of the sequenced data to the external memory can be avoided, the time delay of outputting the sequenced data by the sequencer can be reduced, and the sequencing efficiency can be improved.
In a second aspect, a processor is provided. The processor includes: the device comprises an instruction storage circuit, a control circuit and a sequencing circuit, wherein the instruction storage circuit is coupled with the control circuit, and the control circuit is coupled with the sequencing circuit. Wherein the sequencing circuit comprises: i sequencers, each comprising J inputs and J outputs. The J input ends of the 1 st sequencer in the I sequencers are J input ends of a sequencing circuit, the J output ends of the I sequencer in the I sequencers are J output ends of the sequencing circuit, the J output ends of the I sequencer in the I sequencers are respectively connected with the J input ends of the i+1th sequencer, I is less than or equal to I, I is a positive integer, and J is a positive even number. And a control circuit for reading the second instruction from the instruction storage circuit and decoding the second instruction. The decoded second instruction includes the memory address of H data, H.ltoreq.J. Control circuitry is also to send the decoded second instruction to the sequencing circuitry. Sequencing circuitry, responsive to the decoded second instruction, performs the steps of: and reading H data according to the storage address, and sequencing the H data by using an ith sequencer in a sequencing circuit, wherein the order degree of the H data after sequencing is higher than that of the H data before sequencing.
Based on the processor of the second aspect, the ordering circuit in the processor may increase the order of the H data I times in response to the decoded second instruction, so that the ordering circuit may order the H data into an ordered sequence by ordering the H data one or more times. In other words, the processor can sort the H data by executing 1 second instruction, so that the processor can execute fewer instructions, i.e. can complete the sorting operation of the H data, so that a large number of repeated instructions can be avoided to be executed, the number of sorting instructions to be executed and the execution time can be reduced, and the sorting efficiency can be improved.
In one possible design, the ith sequencer may include K comparators. Wherein K is greater than or equal to J/2,K and can be a positive integer. The kth comparator of the K comparators is used for comparing two data of the H data. Taking the sorting circuit (k= 4,J =8) shown in fig. 5 as an example, each of the sorters may include 4 comparators (A1-A4), each of the comparators may include two input terminals and two output terminals, each of the input terminals of the comparators may be connected to one register, each of the output terminals of the comparators may be connected to one register, and the registers may be used for temporarily storing data. The 8 input ends of the 4 comparators of the ith sequencer are the 8 input ends of the sequencer, and the 8 output ends of the 4 comparators of the ith sequencer are the 8 output ends of the sequencer. The two output ends of the A1 of the 1 st sequencer can be respectively connected with the 1 input end of the A1 and the 1 input end of the A2 of the 2 nd comparing unit, the two output ends of the A2 of the 1 st sequencer can be respectively connected with the 1 input end of the A2 and the 1 input end of the A3 of the 2 nd comparing unit, the two output ends of the A3 of the 1 st sequencer can be respectively connected with the 1 input end of the A3 and the 1 input end of the A4 of the 2 nd comparing unit, and the two output ends of the A4 of the 1 st sequencer can be respectively connected with the 1 input end of the A4 of the 2 nd comparing unit and the 1 input end of the A1. Similarly, the connection between two other adjacent sequencers may refer to the connection between the 1 st sequencer and the 2 nd sequencer.
When the sorting circuit shown in fig. 5 is used to sort 8 data (i.e., h=8), it is assumed that the sorting circuit sorts the data to be sorted in order from small to large. Firstly, the data to be sorted input to the 1 st sorter is: "8, 7, 6, 5, 4, 3, 2, 1", the 1 st sequencer sequences the sequences, the sequenced sequences become: "7, 8, 5, 6, 3, 4, 1, 2". Then, the 2 nd sequencer sequences the sequence output by the 1 st sequencer, and the sequenced sequence becomes: "2, 5, 8, 3, 6, 1, 4, 7". And so on, the sequence output by the 7 th sequencer is as follows: "1, 2, 3, 4, 5, 6, 7, 8". Based on the sorting circuit shown in fig. 5, the order of the data to be sorted is improved every time the data passes through K comparators of one sorter, so that an implementation manner of the sorter can be provided, and the sorter can improve the order of H data.
Alternatively, the K comparators may include J/2 first comparators and (J/2) -1 second comparators. The J-2 output ends of the J/2 first comparators are respectively connected with the J-2 input ends of the (J/2) -1 second comparators. The J inputs of the J/2 first comparators may be J inputs of the ith sequencer, the other 2 outputs of the J/2 first comparators and the J-2 outputs of the (J/2) -1 second comparators may be J outputs of the ith sequencer. Taking the sorting circuit shown in fig. 4 (k= 7,J =8) as an example, each of the sorters includes 7 comparators, and seven comparators in each of the sorters include: 4 first comparators (A1-A4) and 3 second comparators (B1-B3), each comprising two inputs and two outputs. Taking 7 comparators in the 1 st second comparator as an example, 8 input ends of A1-A4 are 8 input ends of the sequencer, 6 output ends of A1-A4 are respectively connected with 6 input ends of B1-B3, and the other 2 output ends of A1-A4 and 6 output ends of B1-B3 are 8 output ends of the 1 st sequencer. The two input terminals of each comparator may be respectively connected to a register, and the two output terminals of each comparator may be respectively connected to a register, which may be used for temporarily storing data.
In order to sort 8 data (i.e., h=8) using the sorting circuit shown in fig. 4, it is assumed that the sorting circuit sorts the data to be sorted in order from small to large, and first, the data to be sorted inputted to the sorting circuit is: "8, 7, 6, 5, 4, 3, 2, 1", the 1 st sequencer sequences the sequences, the sequenced sequences become: "7, 5, 8, 3, 6, 1, 4, 2". Then, the 2 nd sequencer sequences the sequence output by the 1 st sequencer, and the sequenced sequence becomes: "5, 3, 7, 1, 8, 2, 6, 4". And so on, the sequence output by the 4 th sequencer is as follows: "1, 2, 3, 4, 5, 6, 7, 8". Based on the sorting process shown in fig. 4, the order of the data to be sorted is improved every time the data passes through K comparators of one sorter, so that an implementation manner of the sorter can be provided, so that the sorter can improve the order of H data.
Optionally, the number of times the I sequencers sequence the H data is greater than or equal to J/2. For example, referring again to the sorting process shown in fig. 4, the data to be sorted of the sorting circuit is input: "8, 7, 6, 5, 4, 3, 2, 1", the order of which is the lowest order sequence of the sequences of length 8, the sequences can be just arranged into an ordered sequence after being ordered by the ordering circuit shown in fig. 4. In other words, the sorting circuit may sort the H data into an ordered sequence when the number of times the I-sequencers sort the H data is greater than or equal to J/2.
Further, the J input ends of the 1 st sequencer in the I sequencers are respectively connected with the output ends of the J selectors. The first input terminals of the J selectors may be J input terminals of the sorting circuit according to the second aspect, and J output terminals of an I-th sorter of the I-th sorters are connected to the second input terminals of the J selectors, respectively. Since the data to be sorted is output from the J output ends of the I-th sorter and then can be input into the sorting circuit of the second aspect again through the second input ends of the J selectors, that is, the data to be sorted can be circularly sorted in the sorting circuit of the second aspect, the number of times the data to be sorted is circularly sorted can be controlled under the condition that I is smaller than J/2, so that the number of times the H data is sorted by the I-th sorter is greater than or equal to J/2, thereby reducing the hardware scale and saving the cost.
Still further, the above-mentioned relationships of I and J may satisfy the following relationships: i is more than or equal to J/2. In other words, the number of the sequencers is greater than or equal to J/2, so that the number of times of sequencing H data by the I sequencers is greater than or equal to J/2 can be ensured, and the sequencing circuit can sequence the H data into an ordered sequence.
In a third aspect, a method of ordering is provided. Applied to the processor of the first aspect, the processor includes: the device comprises an instruction storage circuit, a control circuit and a sequencing circuit, wherein the instruction storage circuit is coupled with the control circuit, and the control circuit is coupled with the sequencing circuit. The ordering method comprises the following steps: the control circuit reads the first instruction from the instruction storage circuit and decodes the first instruction. The decoded first instruction comprises M first sequences of storage addresses, wherein the length of each first sequence is N, M is an integer greater than 1, and N is an integer greater than 1. The control circuit sends the decoded first instruction to the sequencing circuit. The sequencing circuit responds to the decoded first instruction, reads M first sequences according to the storage address, and outputs an ordered sequence with the length of M times N if the M first sequences are all ordered sequences, or outputs M ordered sequences with the length of N if unordered sequences exist in the M first sequences.
In one possible design, the sorting circuit may include: the device comprises a sequencing controller, a sequencer and M first buffers, wherein the sequencing controller is coupled with the M first buffers, and the M first buffers are coupled with the sequencer. The sorting circuit responds to the decoded first instruction, reads M first sequences according to the memory address, and outputs an ordered sequence with length of m×n if the M first sequences are all ordered sequences, or outputs M ordered sequences with length of N if there is an unordered sequence in the M first sequences, which may include: the sequencing controller reads the M first sequences according to the storage addresses and stores the M first sequences in the M first buffers. Wherein each first buffer stores a first sequence. If the M first sequences are all ordered sequences, the sequencer reads an ordered sequence with a length of m×n from the M first buffers, or if there is an unordered sequence in the M first sequences, the sequencer reads M ordered sequences with a length of N from the M first buffers.
Optionally, if the M first sequences are all ordered sequences, the sequencer reads an ordered sequence with a length of m×n from the M first buffers, and may include: if the M first sequences are all ordered sequences, the sequencer reads an ordered sequence with a length of m×n from the M first buffers according to the first reading rule. Wherein, the first reading rule may be: the first data are read out from M first buffers each time.
Or alternatively, if there is an unordered sequence in the M first sequences, the sequencer reads M ordered sequences with lengths of N from the M first buffers, and may include: if the M first sequences have unordered sequences, the sequencer reads M ordered sequences with the length of N from the M first buffers according to a second reading rule. The second reading rule may be: an ordered sequence of length N is read out in M first buffers at a time.
Optionally, the sorting circuit may further include: and the second buffer is coupled with the sequencing controller. The above-mentioned sequencing controller reads the M first sequences according to the storage addresses, and stores the M first sequences in the M first buffers, which may include: the sequencing controller reads the M first sequences according to the storage address and stores the M first sequences in the second buffer. The ordering controller moves the M first sequences from the second buffer to the M first buffers.
Further, the above-mentioned sequencing controller may include a first read-write controller and a second read-write controller, where the first read-write controller is coupled to the second read-write controller, the second buffer is coupled to the first read-write controller and the second read-write controller, and the M first buffers are all coupled to the first read-write controller. The above-mentioned sequencing controller reads the M first sequences according to the storage address, and stores the M first sequences in the second buffer, which may include: the second read-write controller reads the M first sequences according to the storage address and stores the M first sequences in the second buffer. The moving the M first sequences from the second buffer to the M first buffers by the ordering controller may include: the first read-write controller moves the M first sequences from the second buffer to the M first buffers.
Still further, the method of the third aspect may further include: when M first sequences are moved, the first read-write controller transmits the moved data amount information to the second read-write controller.
Further, the sorting controller may further include a third read-write controller coupled to the sorter and the second buffer, respectively. The method according to the third aspect, further comprising: the third read-write controller moves the ordered sequence output by the sequencer to the second buffer.
In addition, the technical effects of the sorting method described in the third aspect may refer to the technical effects of the processor described in the first aspect, which are not described herein.
In a fourth aspect, a method of ordering is provided. The processor applied to the second aspect includes: the device comprises an instruction storage circuit, a control circuit and a sequencing circuit, wherein the instruction storage circuit is coupled with the control circuit, and the control circuit is coupled with the sequencing circuit. The sequencing circuit includes: the I sequencers each comprise J input ends and J output ends, the J input ends of the 1 st sequencer in the I sequencers are J input ends of a sequencing circuit, the J output ends of the I sequencer in the I sequencers are J output ends of the sequencing circuit, the J output ends of the I sequencer in the I sequencers are respectively connected with the J input ends of the i+1th sequencer, I is less than or equal to I, I is a positive integer, and J is a positive even number. The ordering method comprises the following steps: the control circuit reads the second instruction from the instruction storage circuit and decodes the second instruction. Wherein the decoded second instruction includes the memory address of H data, H.ltoreq.J. The control circuit sends the decoded second instruction to the sequencing circuit. The ordering circuit is responsive to the decoded second instruction and reads the H data according to the memory address. The sorting circuit sorts the H data by using an ith sorter in the sorting circuit, and the degree of order of the H data after sorting is higher than the degree of order of the H data before sorting.
In one possible design, the ith sequencer may include K comparators, where K.gtoreq.J/2,K may be a positive integer. The sorting circuit sorts the H data by using the ith sorter in the sorting circuit, and may include: the kth comparator of the K comparators compares two data of the H data.
Optionally, the K comparators may include J/2 first comparators and (J/2) -1 second comparators, and J-2 output ends of the J/2 first comparators are respectively connected to J-2 input ends of the (J/2) -1 second comparators, and J input ends of the J/2 first comparators may be J input ends of the i-th sequencer, and J-2 output ends of the other 2 output ends of the J/2 first comparators and J-2 output ends of the (J/2) -1 second comparators may be J output ends of the i-th sequencer.
Optionally, the number of times the I sequencers sequence the H data is greater than or equal to J/2.
Further, the J input ends of the 1 st sequencer in the I sequencers are respectively connected to the output ends of the J selectors, and the first input ends of the J selectors may be J input ends of the sequencing circuit. The J output ends of the I-th sequencer in the I sequencers are respectively connected with the second input ends of the J selectors.
Still further, the above-mentioned relationships of I and J may satisfy the following relationships: i is more than or equal to J/2.
In addition, the technical effects of the sorting method described in the fourth aspect may refer to the technical effects of the processor described in the second aspect, which are not described herein.
In a fifth aspect, a method of ordering is provided. Applied to a processor, the processor comprising: the device comprises an instruction storage circuit, a control circuit and a plurality of sequencing circuits, wherein the instruction storage circuit is coupled with the control circuit, and the control circuit is coupled with the plurality of sequencing circuits. The ordering method comprises the following steps: the control circuit reads the second instruction from the instruction storage circuit and decodes the second instruction; the decoded second instruction includes a memory address of the data to be ordered. The control circuit sends the decoded second instruction to the plurality of sequencing circuits. The plurality of sorting circuits are responsive to the decoded second instruction and read the data to be sorted according to the memory address. The plurality of sorting circuits sort the data to be sorted. The data to be sequenced comprises a plurality of sequences, the sequencing circuit is used for reading M sequences in the sequences, and outputting an ordered sequence with the length of M times N if the M sequences are all ordered sequences, or outputting M ordered sequences with the length of N if unordered sequences exist in the M sequences.
Based on the sorting method of the fifth aspect, a plurality of sorting circuits may be used to sort the data to be sorted. Therefore, if the data quantity of the data to be sequenced is too large (for example, more than 100 tens of thousands of data), the sequencing method can realize simultaneous sequencing of a plurality of sequencing circuits, and sequencing efficiency is improved.
The sorting circuit may be a sorting circuit in the processor according to the first aspect.
In one possible design, the sorting circuits sort the data to be sorted, may include: the plurality of sorting circuits iteratively sort the data to be sorted until the data to be sorted is arranged into 1 ordered sequence. In each iteration process, a plurality of ordering circuits are used for arranging N ordered sequences output by the last iteration intoAnd (3) an ordered sequence, wherein M is an integer greater than 1.
Optionally, the above-mentioned multiple sorting circuits iteratively sort the data to be sorted until the data to be sorted is arranged into 1 ordered sequence, which may include: each sorting circuit sorts part of data in the data to be sorted into 1 ordered sequence. Based on a plurality of sequencing circuitsA sorting circuit for sorting E ordered sequences intoAnd (3) an ordered sequence. If it is Will beIs determined as E and returns to execute based on a plurality of sequencing circuitsA sorting circuit for sorting E ordered sequences intoA step of ordering the sequences; otherwise, outputting an ordered sequence corresponding to the data to be ordered. In this way, the number of sequencing circuits participating in sequencing can be gradually reduced in the sequencing process of the data to be sequenced, and occupied processing resources are reduced.
In a sixth aspect, an electronic device is provided. The electronic device comprises a processor comprising any one of the possible implementations of the first aspect and/or a processor comprising any one of the possible implementations of the second aspect.
In addition, the technical effects of the processor described in the sixth aspect may refer to the technical effects of the processor described in any implementation manner of the first aspect and the second aspect, which are not described herein.
In a seventh aspect, there is provided a computer readable storage medium comprising: computer programs or instructions; the computer program or instructions, when run on a computer, cause the computer to perform the ordering method according to any one of the possible implementations of the third-fifth aspects.
Further, the technical effects of the computer readable storage medium according to the seventh aspect may refer to the technical effects of the sorting method according to any one of the implementation manners of the third aspect to the fifth aspect, which are not described herein.
In an eighth aspect, there is provided a computer program product comprising a computer program or instructions which, when run on a computer, cause the computer to perform the sorting method according to any one of the possible implementations of the third to fifth aspects.
Further, the technical effects of the computer program product according to the eighth aspect may refer to the technical effects of the sorting method according to any implementation manner of the third aspect to the fifth aspect, which are not described herein.
Drawings
FIG. 1 is a schematic diagram of a processor according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a sequencing circuit according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a first buffer for receiving and transmitting data according to an embodiment of the present application;
FIG. 4 is a schematic diagram of the sequencing of M sequences using the sequencing circuit of FIG. 2;
FIG. 5 is a schematic diagram of another processor according to an embodiment of the present application;
FIG. 6 is a schematic diagram of another sort circuit according to an embodiment of the present application;
FIG. 7 is a schematic diagram II of another sort circuit according to an embodiment of the present application;
fig. 8 is a schematic diagram III of another sort circuit according to an embodiment of the present application;
Fig. 9 is a schematic diagram of another sort circuit according to an embodiment of the present application;
FIG. 10 is a schematic diagram of sorting H data using the sorting circuit of FIG. 6;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 12 is a flowchart illustrating a sorting method according to an embodiment of the present application;
FIG. 13 is a second flow chart of a sorting method according to an embodiment of the present application;
fig. 14 is a flowchart of a sorting method according to an embodiment of the present application;
FIG. 15 is a flowchart of a sorting method according to an embodiment of the present application;
reference numerals: a 100-processor; 110-instruction storage circuitry; 120-a control circuit; 130-ordering circuitry; 210-sequencer; 220-a first buffer; 230-sequencing controller; 231-a first read-write controller; 232-a second read-write controller; 233-a third read-write controller; 240-a second buffer; 250-output buffer; 500-a processor; 510-instruction storage circuitry; 520-control circuitry; 530-a sequencing circuit; 531-sequencer.
Detailed Description
First, the embodiments of the present application will be briefly described with respect to possible technical terms.
Sequence: refers to a plurality of data arranged in a column. The length of a sequence is equal to the number of data contained in the sequence. For example, assume a sequence of: "7, 3, 6, 5, 10, 15", then the length of this sequence is 6. Wherein the sequences include ordered sequences and unordered sequences.
Ordered sequence: it means that the data in a sequence is arranged according to an ordering rule. Assuming that the ordering rule of the data is arranged from small to large, then the sequence: "3, 5, 6, 7, 10, 15" may be referred to as an ordered sequence. Wherein the ordering rule may include: small to large or large to small.
Disordered sequence: meaning that the data in a sequence is not arranged according to the ordering rules. Assuming that the ordering rule of the data is arranged from small to large, then the sequence: "7, 3, 6,5, 10, 15" may be referred to as a disordered sequence.
Degree of sequence order: refers to the degree of sequence order. The degree of order of a sequence can be represented by the number of reverse pairs (count extensions) in the sequence. Wherein the number of pairs of inverted sequences in a sequence is inversely related to the degree of order of the sequence.
TOPK ordering: refers to finding the largest K data in a sequence or the smallest K data in a sequence.
Illustratively, assuming that the ordering rule of the data is from small to large, then for the sequence: "7, 3, 6,5, 10, 15", the number of pairs of inverted sequences in the sequence is 4, respectively: (7, 3), (7, 6), (7, 5), (6, 5). For the sequence: "3, 7,6, 5, 10, 15", the number of pairs of inverted sequences in the sequence is 3, respectively: (7, 6), (7, 5), (6, 5). Obviously, the former is less ordered than the latter.
Micro-architecture: is the internal design of the processor implementing the instruction set.
The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments.
Hereinafter, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature.
In the present application, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the present application, unless specifically stated and limited otherwise, the term "connected" is to be construed broadly, and for example, "connected" may refer to a physical direct connection, or may refer to a connection that is electrically achieved through an intermediary, such as a resistor, inductor, capacitor, or other electronic device.
Embodiments of the present application provide a processor 100. Fig. 1 is a schematic diagram of a processor 100 according to an embodiment of the application. Referring to fig. 1, the processor 100 includes: instruction storage circuitry 110, control circuitry 120, and sequencing circuitry 130, instruction storage circuitry 110 is coupled to control circuitry 120, and control circuitry 120 is coupled to sequencing circuitry 130.
The processor 100 may be a central processing unit (central processing unit, CPU), and the processor 100 may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor, such as a vector processor (vector processor), a coprocessor (coprocessor), a ARM (advanced RISC machines) processor, or the like, and embodiments of the application are not limited in this respect.
Wherein instruction storage circuitry 110 may store a plurality of instructions. Instruction storage circuitry 110 may be implemented using instruction caches or instruction registers, and may be included in a cache subsystem (memory subsystem) in processor 100. The control circuitry 120 may pre-process instructions including: finger fetching, decoding and the like. The control circuit 120 may be a control unit in the processor 100, which may also be referred to as a front end (front end). Sequencing circuit 130 may be implemented in a microarchitecture, and sequencing circuit 130 may be included in an execution unit (execution engine) of processor 100, which may also be referred to as an arithmetic unit.
In an embodiment of the present application, the control circuit 120 is configured to read the first instruction from the instruction storage circuit 110 and decode the first instruction. The decoded first instruction comprises M first sequences of storage addresses, wherein the length of each first sequence is N, M is an integer greater than 1, and N is an integer greater than 1. Control circuitry 120 is also configured to send the decoded first instruction to sequencing circuitry 130. Sequencing circuitry 130, responsive to the decoded first instruction, performs the steps of: and reading M first sequences according to the storage address, and outputting an ordered sequence with the length of M x N if the M first sequences are all ordered sequences, or outputting M ordered sequences with the length of N if unordered sequences exist in the M first sequences.
In one possible design, referring to fig. 2, fig. 2 is a schematic structural diagram of a sorting circuit 130 according to an embodiment of the present application, where the sorting circuit 130 may include: the sequencer 210, the sequencing controller 230, and M first buffers 220 (m=4 in fig. 2 as an example). The sorting controller 230 is coupled to M first buffers 220, and the M first buffers 220 are coupled to the sorter 210.
The first buffer 220 may be implemented using registers or register-related circuit components, and may be used to store sequences. Specifically, the first buffer 220 may be implemented by a built-in buffer (e.g., a buffer), for example, the first buffer 220 may be an Input Buffer (IB). Sequencer 210 may be implemented with a comparator or comparator-related circuit component that may be used to sequence a plurality of data and output an ordered sequence.
The sequencer 210, the sequencing controller 230, and the M first buffers 220 may be combined to implement a data sequencing function, and specific implementation procedures are described below.
The ordering controller 230 is configured to read the M first sequences according to the storage address, and send one first sequence to each first buffer 220. Wherein the sequencing controller 230 may receive the M first sequences from other units, devices or apparatuses. Illustratively, assuming that the M first sequences are stored in a Global Memory (GM), the ordering controller 230 may send a read instruction to the global memory according to the storage address and through the bus in response to the decoded first instruction, the read instruction being used to instruct the global memory to send the M first sequences to the ordering controller 230. The ordering controller 230 may then receive the M first sequences from the global memory and send one first sequence to each first buffer 220. Of course, the ordering controller 230 may also receive the M first sequences from the global memory without sending a read instruction to the global memory.
Each first buffer 220 is configured to receive one first sequence, so that M first buffers 220 may receive and store M first sequences. The M first sequences may also be referred to as data to be ordered.
Wherein, the M first sequences may be all ordered sequences, or may exist as unordered sequences. If the M first sequences may be all ordered sequences, the sequencer 210 may arrange the M ordered sequences into one ordered sequence in the following manner 1. If there are M first sequences with unordered sequences, the sequencer 210 may arrange the M sequences with unordered sequences into M ordered sequences in the following manner 2.
Mode 1, the sorter 210 is configured to read an ordered sequence with a length of m×n from the M first buffers 220. Specifically, if the M first sequences are all ordered sequences, the sequencer 210 reads an ordered sequence with a length of m×n from the M first buffers 220 according to the first reading rule, that is, the sequencing circuit 130 may implement a function of arranging the M ordered sequences into one ordered sequence.
Wherein, the first reading rule may be: the first data is read out in M first buffers 220 each time, that is, one first data is read out in M ordered sequences each time, for example, assuming that there are two ordered sequences, respectively: "3, 5, 6, 9", "7, 13, 25, 26", and the order rule of the ordered sequences is from small to large, then the data of the first of the two ordered sequences is 3. Alternatively, when the first buffers 220 are implemented using input buffers, the sequencer 210 may read out the first-aligned data one at a time at the output terminals of the M first buffers 220.
Illustratively, assuming that the ordering rule of the ordered sequence is from small to large, n=4, m=4, and the 4 first buffers 220 are respectively: IB0, IB1, IB2, IB3, table 1 is a table of the first sequences received by the 4 first buffers 220, respectively. Wherein the first sequence received by IB0 is: "3, 5, 6, 9", the first sequence received by IB1 is: "7, 13, 25, 26", the first sequence received by IB2 is: "2, 8, 9, 15", the first sequence received by IB3 is: "33, 36, 50, 72".
TABLE 1
Referring to table 1, the sorter 210 may first read out the first data in the first sequence of the 4 first sequences, i.e. data 2 in IB 2. After this data is read out, data 2 is no longer stored in IB2, so that the first data of the first sequence in IB2 becomes 13. The sorter 210 may then read out again the first data of the rank, i.e., data 3 in IB0, in these 4 first sequences. By analogy, the sequencer 210 may read the data in the 4 first buffers 220 multiple times according to the first read rule, and the read data may be arranged in a read order into an ordered sequence of length 4*4, including: "2, 3, 5, 6, 7, 8, 9, 13, 15, 25, 26, 33, 36, 50, 72".
It should be understood that, based on the above manner 1, the processor can sort the M ordered sequences by executing the first instruction 1 time, so that the number of sorting instructions to be executed and the execution time can be reduced, and the sorting efficiency can be improved.
When the sorting circuit 130 arranges the M ordered sequences into one ordered sequence, for example, please refer to table 1 again, if the smallest 3 data in one sequence is found (i.e. TOP3 sorting is performed), the sorter 210 reads the third data (5) in the process of reading the data in the 4 first buffers 220 multiple times, the smallest 3 data (2, 3, 5) in one sequence can be found, and at this time, the sorting circuit 130 can stop sorting and discard sorting the data after the 3 rd data, so as to further reduce the calculation amount, save the sorting time, and improve the sorting efficiency.
In addition, if the sorting circuit 130 needs to sort more than M ordered sequences into one ordered sequence, it may be implemented by multiple rounds of sorting. For example, assuming that m=4 and 10 ordered sequences need to be arranged into one ordered sequence, the 10 ordered sequences may first be divided into 3 groups, the first group comprising 4 ordered sequences, the second group comprising 4 ordered sequences, and the third group comprising 2 ordered sequences. The sorting circuit 130 may then arrange the ordered sequences in each group into one ordered sequence, respectively, outputting 3 ordered sequences. Finally, the sorting circuit 130 arranges the 3 ordered sequences into one ordered sequence, thereby achieving the purpose of arranging more than M ordered sequences into one ordered sequence.
In mode 2, the sequencer 210 is configured to read M ordered sequences with length N from the M first buffers 220 if there is an unordered sequence in the M first sequences. Specifically, if there are unordered sequences in the M first sequences, the sequencer 210 reads M ordered sequences with a length of N in the M first buffers 220 according to the second reading rule, that is, the sequencing circuit 130 may implement a function of arranging the M sequences with unordered sequences into M ordered sequences.
The second reading rule may be: each time an ordered sequence of length N is read out in M first buffers 220, specifically, the sequencer 210 may read out x data in each first buffer 220 at a time, and then arrange the M x data into an ordered sequence, where x is a positive integer. For example, let x=1, m=4, and the first sequences stored in the 4 first buffers 220 are respectively: "10, 8, 20, 3", "1, 25, 33, 7", "6, 4, 16, 23", "11, 15, 16, 18", then sorter 210 may read 1 data in the 4 first buffers 220, respectively, for the first time, i.e., read "10, 1, 6, 11", and then arrange the 4 data into an ordered sequence: "1, 6, 10, 11". Alternatively, when the first buffers 220 are implemented using input buffers, the sequencer 210 may read x data at the output of each first buffer 220 at a time. It will be appreciated that, since x may be an integer greater than or equal to 1, when the value of x becomes larger, the sequencer 210 may read more data from the M first buffers 220 at a time for sequencing, thereby improving the sequencing efficiency.
Illustratively, assuming that the ordering rule of the ordered sequence is from small to large, n=4, m=4, and the 4 first buffers 220 are respectively: IB0, IB1, IB2, IB3, table 2 is a table of the first sequences received by the 4 first buffers 220, respectively. Wherein the first sequence received by IB0 is: "9, 6, 20, 3", the first sequence received by IB1 is: "6, 25, 33, 7", the first sequence received by IB2 is: "1, 9, 8, 2", the first sequence received by IB3 is: "2, 50, 36, 33".
TABLE 2
Referring to table 2, the sorter 210 may first read the first data of the 4 first sequences, i.e. 9, 6, 1, 2, and sort the first data of the 4 first sequences, i.e. sort the first data of 9, 6, 1, 2, to obtain an ordered sequence: "1, 2, 6, 9". Wherein after reading out the first data of the 4 first sequences, the first data of the 4 first sequences becomes 6, 25, 9, 50. The sorter 210 may then read out the first data of the 4 first sequences (i.e., 6, 25, 9, 50) and sort the first data of the 4 first sequences according to the sorting rule to obtain an ordered sequence: "6, 9, 25, 50". By analogy, the sequencer 210 may read the data in the 4 first buffers 220 multiple times according to the second read rule, where the read data is 4 ordered sequences of length 4, including: "1, 2, 6, 9", "6, 9, 25, 50", "8, 20, 33, 36" and "2, 3, 7, 33".
It should be understood that, based on the above manner 2, the processor can sort the M sequences with the unordered sequences by executing the first instruction 1 time, so that the number of sorting instructions to be executed and the execution time can be reduced, and the sorting efficiency can be improved.
The above-described modes 1 and 2 may be implemented independently or in combination. When embodiment 1 is combined with embodiment 2, the method may include: first, the method 2 is performed to arrange M unordered sequences into M ordered sequences. Then, the method 1 is performed to arrange the M ordered sequences into one ordered sequence. In other words, the ordering circuit 130 may be implemented to order M sequences into an ordered sequence, i.e., order a large amount of unordered data. For example, assuming that 10000 unordered data needs to be sorted, the 10000 data may be divided into 4 sequences with a length of 2500, and then input to the sorting circuit 130 shown in fig. 2, the 4 sequences with a length of 2500 are sorted by the sorting circuit 130 to obtain 4 ordered sequences with a length of 2500, and then the 4 ordered sequences with a length of 2500 are input to the sorting circuit 130 shown in fig. 2 to obtain an ordered sequence with a length of 10000. Thus, the processor can arrange the M sequences into 1 ordered sequence by executing the first instruction for 2 times, so that the processor can execute fewer instructions, namely the ordering operation of the sequences can be completed, a large number of repeated instructions can be avoided, the number of ordered instructions to be executed and the execution time can be reduced, and the ordering efficiency can be improved.
Because the first buffer 220 in the sorting circuit 130 shown in fig. 2 may be implemented by an internal buffer, the read-write speed is faster than that of an external memory (such as a global memory) and may be used as a first level buffer between the external memory and the sorter 210, so as to avoid the sorter 210 from reading data from the external memory, thereby reducing the time delay of reading data by the sorter 210 and further improving the sorting efficiency. In addition, since the sorting controller 230 can control the input of the data to be sorted, the data sorting process of the sorter 210 can be controlled, and the function of controlling the data sorting is realized. For example, the sorting controller 230 may control the rate at which data is received and the rate at which data is transmitted to the M first buffers 220, thereby controlling the rate at which the sorter 210 outputs an ordered sequence to control the sorting speed of the sorting circuit.
When the first buffer 220 is implemented as an input buffer, the first buffer 220 may store the sequence in the form of a queue. Wherein the sequencer 210 may read data from the head of the queue (i.e., the output of the first buffer 220) when reading data from the first buffer 220. The first buffer 220, upon receiving the first sequence from the ordering controller 230, may write data at the end of the queue (i.e., at the input of the first buffer 220).
For example, fig. 3 is a schematic diagram of the first buffer 220 for receiving and transmitting data, as shown in fig. 3, assuming that a maximum of 10 data can be stored in one first buffer 220, the length of the first sequence received by the first buffer 220 is 20, and the first buffer 220 has already received 10 data in the first sequence, including: 4. 6, 7, 14, 16, 22, 23, 51, 71, 89, then when sequencer 210 reads head of queue data 4 from the first buffer 220, the order of the remaining data in the first buffer 220 is sequentially shifted 1 bit toward the head of queue, i.e., the remaining data 6 becomes new head of queue data and the remaining data 89 becomes tail of queue data. Since 10 data in the first sequence are not received by the first buffer 220, the first buffer 220 may also receive data in the first sequence, and assuming that the data received by the first buffer 220 is 90, then 90 is new tail data. By analogy, as the sequencer 210 reads data from the first buffer 220, the first buffer 220 can sequentially receive all of the data in the first sequence. Thus, even if the length of the first buffer 220 is smaller than the length of the first sequence, the first buffer 220 can realize a function of receiving a sequence of an arbitrary length.
Optionally, to further reduce the latency of receiving data by the first buffer 220, the ordering circuit 130 shown in fig. 2 may further include a second buffer 240 (shown in fig. 2 with a dashed box), the second buffer 240 being coupled to the ordering controller 230. The second buffer 240 may be implemented by a Unified Buffer (UB), a cache (cache), or a static memory (static random access memory, SRAM), or the like.
The second buffer 240 may implement a function of temporarily storing data, and a specific implementation procedure will be described below.
The ordering controller 230 may be configured to read the M first sequences according to the storage address, and store the M first sequences in the second buffer 240. The ordering controller 230 may also be configured to move the M first sequences from the second buffer 240 to the M first buffers 220.
Illustratively, assuming that the M first sequences are stored in the global memory, the ordering controller 230 may send a first read instruction to the global memory according to the storage address and through the bus, the first read instruction being used to instruct the global memory to send the M first sequences to the second buffer 240. The second buffer 240 may then receive the M first sequences from the global memory and send the M first sequences to the ordering controller 230. Then, the sorting controller 230 receives the M first sequences from the second buffer 240 and transmits one first sequence to each of the first buffers 220, thereby implementing that the M first sequences are moved from the second buffer 240 to the M first buffers 220, and the data sorting is performed by the M first buffers 220 and the sorter 210. In this process, the second buffer 240 may implement a function of temporarily storing data.
In addition, since the second buffer 240 may be implemented by using a unified buffer, a cache memory, or a static memory, the read/write speed is faster than that of an external memory (such as a global memory), and the second buffer may be used as a second level buffer between the external memory and the sequencer 210, so as to avoid the first buffer 220 from reading data from the external memory, further reduce the delay of receiving data from the first buffer 220, and further reduce or even eliminate the delay of reading data from the first buffer 220 by the sequencer 210, thereby further improving the sequencing efficiency.
Further, the above-mentioned ordering controller 230 may include a first read-write controller 231 (shown in a dashed box in fig. 2) and a second read-write controller 232 (shown in a dashed box in fig. 2), where the first read-write controller 231 is coupled to the second read-write controller 232, the second buffer 240 is coupled to the first read-write controller 231 and the second read-write controller 232, respectively, and the M first buffers 220 are coupled to the first read-write controller 231.
The first read-write controller 231, the second read-write controller 232, and the second buffer 240 may be combined to realize a function of continuously forwarding data to be ordered, and specific implementation procedures are described below.
The second read/write controller 232 is configured to read the M first sequences according to the storage address, and store the M first sequences in the second buffer 240. The first read/write controller 231 is configured to move the M first sequences from the second buffer 240 to the M first buffers 220.
Specifically, the second read/write controller 232 is configured to send a first read command to read the M first sequences, and store the M first sequences in the second buffer 240. The first read command may refer to the above description about the function of the second buffer 240 for temporarily storing data, which is not described herein.
The second read/write controller 232 is further configured to receive the first information and send the first information to the first read/write controller 231.
The first information may be: the M first sequences of information are written to the second buffer 240. In other words, the first information may indicate the data written in the second buffer 240 in the M first sequences. For example, M first sequences have 10000 data in total, 2000 data out of the 10000 data have been stored in the second buffer 240, and then the first information may indicate a storage address of the 2000 data.
In a possible implementation manner, the second read-write controller 232 is further configured to receive the first information, and may include: the second read/write controller 232 receives first information from other memories (e.g., global memory) indicating data written to the second buffer 240 in M first sequences.
In another possible implementation, the second read-write controller 232 is further configured to receive the first information, and may include: the second read/write controller 232 and the second buffer 240 simultaneously receive (as shown in fig. 2, through the same data line) M first sequences, and the second read/write controller 232 determines the data already stored in the second buffer 240, that is, determines the first information, based on the received M first sequences. Of course, the second read/write controller 232 may not store data, but only record the storage addresses of these data in the second buffer 240.
The first read/write controller 231 is configured to send a second read instruction to the second buffer 240 based on the first information. The second read instruction is used for requesting M first sequences.
Alternatively, if the M-th first buffer 220 of the M first buffers 220 has unused memory space, the first read/write controller 231 sends the second read command to the second buffer 240 based on the first information.
The second buffer 240 is further configured to receive M first sequences.
The second buffer 240 is further configured to receive a second read command from the first read/write controller 231 and send M first sequences to the first read/write controller 231.
The first read/write controller 231 is further configured to receive the M first sequences from the second buffer 240, and send one first sequence to each of the first buffers 220, so as to move the M first sequences from the second buffer 240 to the M first buffers 220.
When the function of continuously forwarding the data to be sequenced is realized, the method comprises the following two cases:
in case 1, the storage space of the second buffer 240 is greater than or equal to the storage space occupied by M first sequences.
In case 2, the storage space of the second buffer 240 is smaller than the storage space occupied by the M first sequences.
The following describes a procedure for realizing the above-described function of continuously forwarding data to be sorted, respectively, for the above two cases.
For case 1, since the storage space of the second buffer 240 is greater than or equal to the storage space occupied by the M first sequences, that is, the second buffer 240 can receive all the data to be sorted at one time. In this case, after the second read/write controller 232 sends the first read command, the second buffer 240 may receive M first sequences at a time. Then, the first read/write controller 231 transmits a second read command to the second buffer 240 based on the first information, and may read M first sequences in the second buffer 240 and transmit one first sequence to each of the first buffers 220. Of course, the first read-write controller 231 may not store the received data, but forward the received data to the M first buffers 220.
For case 2, the storage space of the second buffer 240 is smaller than the storage space occupied by the M first sequences, that is, the second buffer 240 cannot receive all the data to be sorted at one time. In this case, the second buffer 240 may receive all the data to be sorted through the write-many and read-many processes.
For each writing process, the second read/write controller 232 may send a first read command, where the first read command is used to request a portion of data in the M first sequences, and the size of the storage space occupied by the portion of data is smaller than or equal to the size of the storage space of the second buffer 240. When receiving the M first sequences, the second buffer 240 may receive part of the data in the M first sequences, thereby completing the write-once process.
For each read process, the second read-write controller 232 may receive the first information and send the first information to the first read-write controller 231. The first read/write controller 231 may read out the data written in the second buffers 240 in the M first sequences based on the first information, and send a portion of the data of the first sequence to each of the first buffers 220, thereby completing one read-out process.
In order to ensure that each first buffer 220 can receive the first sequence, the situation that the data ordering cannot be completed because the first buffer 220 has no first sequence is avoided, in the foregoing case 2, in each readout process, the first read command may request a part of data in M first sequences, and the data duty ratio of each first sequence in the part of data is the same. For example, assuming that m=4, each first sequence has a length of 2500, and the first read command requests 100 data in the M first sequences, each first sequence may occupy 25 data in the 100 data.
It should be appreciated that during the multiple writing and multiple reading of the second buffer 240, the second buffer 240 may continuously receive a portion of the data to be sorted, and the first read-write controller 231 may continuously read a portion of the data to be sorted already stored in the second buffer 240 and continuously send a first sequence of portions of the data to each of the first buffers 220. In this way, the first read-write controller 231, the second read-write controller 232 and the second buffer 240 may be combined to continuously receive the data to be sorted, and continuously send the data to be sorted to the M first buffers 220, so that the sorter 210 may continuously sort the data to be sorted, thereby realizing a function of continuously forwarding the data to be sorted, and further improving the sorting efficiency.
It should be noted that, for case 2, in the write-read process, the first read command may request the complete M first sequences. During the write-many read process, the second read instruction may request the complete M first sequences, such that the second buffer 240 sends the complete M first sequences to the first read-write controller 231.
In addition, since the first read/write controller 231 can read out the data in the second buffer 240, the first read/write controller 231 may be referred to as a local read controller (local read control, LRC). Since the second read-write controller 232 can read out the data to be ordered in other memories, the second read-write controller 232 can be referred to as an automatic read controller (auto read control, ARC).
Still further, the first read/write controller 231 is further configured to send the moved data amount information to the second read/write controller 232 when M first sequences are moved.
The moved data amount information may be: the first read/write controller 231 has read out the data amounts of the M first sequences from the second buffer 240. Since the moved data amount information corresponds to the information of the free storage space size of the second buffer 240, the second read/write controller 232 may send the first read command according to the moved data amount information, so that the data which is not written into the second buffer 240 in the M first sequences is written into the second buffer 240 as soon as possible. In this way, the sorting efficiency can be further improved.
Illustratively, in each readout process of case 2 described above, the first read-write controller 231 may also transmit the moved data amount information to the second read-write controller 232 when receiving M first sequences. In each writing process of the above case 2, the second read-write controller 232 may also send the first read instruction according to the moved data amount information. For example, when the moved data amount information indicates that the second buffer 240 has been read out of 100 data, the second read/write controller 232 transmits a first read instruction to request the remaining data in the M first sequences to be moved to the second buffer 240.
Based on the above description of the first read/write controller 231, the second read/write controller 232, and the second buffer 240, it is understood that the first read/write controller 231, the second read/write controller 232, and the second buffer 240 may be combined to realize: when the second buffer 240 has free memory space, writing part or all of the M first sequences of data into the second buffer 240; and when part or all of the M first sequences of data are stored in the second buffer 240, the data are read out and input to the M first buffers 220, so that the sequencer 210 can sequentially sequence the data to be sequenced.
Further, the sorting controller 230 may further include a third read-write controller 233 (shown in dashed line in fig. 2), and the third read-write controller 233 may be coupled to the sorting device 210 and the second buffer 240, respectively. Optionally, the third read-write controller 233 may be coupled to the sequencer 210 in a manner that includes: sequencer 210 is coupled to an output buffer 250 (shown in phantom in fig. 2), and output buffer 250 is coupled to third read-write controller 233.
The output buffer 250 may be implemented using a register or a register-related circuit component, and may be used to store a sequence. For example, the output buffer 250 may be implemented as a built-in buffer, such as a buffer, and in particular, the output buffer 250 may be an Output Buffer (OB).
The third read/write controller 233 can implement the function of writing out ordered data, and specific implementation procedures are described below.
The third read/write controller 233 is configured to receive the ordered sequence output by the sequencer 210, and send the ordered sequence output by the sequencer 210 to the second buffer 240. In other words, the third read-write controller 233 is configured to move the ordered sequence output from the sequencer 210 to the second buffer 240.
For example, the sequencer 210 may write an ordered sequence to the output buffer 250, and then the third read-write controller 233 reads the ordered sequence from the output buffer 250 and transmits the ordered sequence to the second buffer 240. In this way, the third read-write controller 233 can realize a function of writing out the ordered data. In addition, since the second buffer 240 may be implemented by using a unified buffer, a cache memory, or a static memory, the read/write speed is faster than that of the external memory, and may be used as a second level buffer between the external memory and the sequencer 210, the third read/write controller 233 may send the sequenced data to the second buffer 240, and may avoid sending the sequenced data to the external memory, thereby reducing the delay of outputting the sequenced data by the first buffer 220 and improving the sequencing efficiency.
Optionally, if the length of the ordered sequence output by the stored sequencer 210 in the second buffer 240 is greater than the output threshold, the third read-write controller 233 is configured to read the ordered sequence output by the stored sequencer 210 in the second buffer 240, and send the ordered sequence to an external memory (e.g. global memory) according to the state (e.g. occupied or not) of the bus.
In addition, the third read/write controller 233, which receives the ordered sequence output by the sequencer 210, may also be configured to send the ordered sequence output by the sequencer 210 to an external memory (e.g., global memory).
Since the third read-write controller 233 can write the ordered data to the second buffer 240 or other memory, the third read-write controller 233 can be referred to as an auto-write controller (auto write control, AWC).
It should be noted that, the above-mentioned sequence controller 230 and the first read-write controller 231, the second read-write controller 232, and the third read-write controller 233 in the sequence controller 230 may be implemented by one or more of the following: one or more gates coupled together, one or more field programmable gate arrays (field programmable gate array, FPGA), one or more central processing units (central processing unit, CPU), or application specific integrated circuits (application specific integrated circuit, ASIC), etc.
Since sorter 210 may merge multiple sequences into one ordered sequence, sort circuit 130 described above may also be referred to as a merge sort (mergesort) circuit.
Based on the processor 100 shown in fig. 1, the sorting circuit 130 in the processor 100 may be configured to arrange M ordered sequences into one ordered sequence with a length of m×n or M sequences with an unordered sequence into M ordered sequences with a length of N in response to the decoded first instruction. In other words, the processor 100 may sort the sequences by executing 1 first instruction, so that the processor 100 may execute fewer instructions, i.e. may complete the sorting operation of the sequences, and may avoid executing a large number of repeated instructions, so as to reduce the number of the sorting instructions to be executed and the execution time, thereby improving the sorting efficiency. The processor 100 shown in fig. 1 may include a plurality of ordering circuits 130 to implement multi-core parallel ordering, where the multi-core parallel ordering process may refer to fig. 15 described below.
In addition, the data structure of each data included in the first sequence may be: { value (score), index (index) }. Wherein "numerical value" may be implemented using any of the following: a 16-bit floating-point number (FP), a 32-bit floating-point number, an 8-bit Integer (INT), a 16-bit integer, or a 32-bit integer, etc. The "index" corresponds to the "numerical value", and includes the address of various information of the "numerical value", and can be implemented by using a pointer, that is, various information of the corresponding "numerical value" can be acquired through the "index". When ordering such data, the sequencer 210 may compare the sizes of the data based on the "numerical value" of the respective data, thereby implementing the ordering function. Thus, the processor 100 can process data with wider bit number, and redundant data is not generated in the ordering process, so that the bus utilization rate and the ordering efficiency are improved.
For example, fig. 4 is a schematic diagram of the sorting circuit 130 according to an embodiment of the present application for sorting M sequences, and referring to fig. 4, the M first sequences include: sequence 0, sequence 1, …, sequence M-1, each sequence comprising N data, the data structure of each data being: { value, index }, the sorting circuit 130 (mergesort) may sort the M sequences, and output an ordered sequence.
The embodiments shown in fig. 1-4 above illustrate a processor 100 provided by an embodiment of the present application, and another processor 500 provided by an embodiment of the present application is described below with reference to fig. 5-9.
Referring to fig. 5, fig. 5 shows another processor 500 according to an embodiment of the application. The processor 500 includes: instruction storage circuitry 510, control circuitry 520, and sequencing circuitry 530, instruction storage circuitry 510 being coupled with control circuitry 520, control circuitry 520 being coupled with sequencing circuitry 530.
The processor 500 may be a central processing unit (central processing unit, CPU), and the processor 100 may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor, such as a vector processor (vector processor), a coprocessor (coprocessor), a ARM (advanced RISC machines) processor, or the like, and embodiments of the application are not limited in this respect.
Among other things, instruction storage circuitry 510 may store a plurality of instructions. Instruction storage circuitry 510 may be implemented using instruction caches or instruction registers, and may be included in a cache subsystem (memory subsystem) in processor 500. Control circuitry 520 may pre-process instructions including: finger fetching, decoding and the like. The control circuit 520 may be a control unit in the processor 500, which may also be referred to as a front end (front end). Sequencing circuit 530 may be implemented using a microarchitecture, and sequencing circuit 530 may be included in an execution unit (execution engine) of processor 500, which may also be referred to as an arithmetic unit.
In an embodiment of the present application, the sorting circuit 530 may include: i sequencers 531, each sequencer 531 comprising J inputs and J outputs, where I is a positive integer and J is a positive even number.
The J inputs of the 1 st sorter 531 of the I sorters 531 are the J inputs of the sorting circuit 530. That is, J inputs of the 1 st sorter 531 may be used to receive data to be sorted.
The J outputs of the I-th sorter 531 of the I-th sorters 531 are the J outputs of the device. That is, J inputs of the 1 st sorter 531 may be used to output sorted data.
The J output terminals of the ith sorter 531 of the I-th sorter 531 are connected to the J input terminals of the (i+1) -th sorter 531, respectively, that is, the I-th sorter 531 is connected in sequence. Wherein I is less than or equal to I. Of course, since the structures of the different sequencers 531 are the same, in order to save physical resources in actual implementation, the sequencers may be multiplexed, for example, the I sequencers 531 included in the sequencing circuit 530 actually multiplex the sequencers 531 of one entity.
Control circuitry 520 is configured to read the second instruction from instruction storage circuitry 510 and decode the second instruction. The decoded second instruction includes the memory address of H data, H.ltoreq.J. Control circuitry 520 is also configured to send the decoded second instruction to sequencing circuitry 530. Sequencing circuit 530, responsive to the decoded second instruction, performs the steps of: the H data are read according to the memory address, and the H data are sorted by the ith sorter 531 in the sorting circuit 530, and the sorted H data have a higher order than the H data before sorting.
Next, the above-mentioned sorting circuit 530 is described with reference to fig. 6, fig. 6 is a schematic diagram of a structure of the sorting circuit 530 according to an embodiment of the present application, where the structure of the sorting circuit 530 with i= 4,J =8 is shown in fig. 6.
Referring to fig. 6, the sorting circuit 530 includes 4 sorters 531, each sorter 531 includes 8 input terminals and 8 output terminals, and the 8 input terminals of each sorter 531 are respectively: the 8 outputs of each sorter 531 are respectively: o1, O2, O3, …, O8. The 8 inputs of the 1 st sequencer 531 are the 8 inputs of the sequencing circuit 530, and the 8 outputs of the 4 th sequencer 531 are the 8 outputs of the sequencing circuit 530.
Table 3 is a table of cases where data to be sorted is input to the sorting circuit 530 shown in fig. 6. Referring to table 3, if the data to be sorted is: "15, 8, 9, 6, 7, 20, 13, 1" (i.e., h=8), then when the sequence is input to the 8 inputs of the sorting circuit 530, I1, I2, I3, …, I8 of the 1 st sorter 531, i.e., 15 inputs I1, 8 inputs I2, …, 1 inputs I8, i.e., the data to be sorted is input to the sorting circuit 530 in the order of J inputs.
TABLE 3 Table 3
Table 4 is a table of the cases where the sorted data are output by sorting circuit 530 shown in fig. 6. Referring to table 4, if the ordered data is: "1, 6, 7, 8, 9, 13, 15, 20", then when the sequence is output from the 8 outputs of the sorting circuit 530, the sequence may be sequentially output from O1, O2, O3, …, O8 of the 4 th sorter 531 in the order of the sequence, i.e., 1 is output from O1, 6 is output from O2, …, 20 is output from O8, i.e., the sorted data is output from the sorting circuit 530 in the order of the J outputs.
TABLE 4 Table 4
When sorting data using the sorting circuit 530 described above, the i-th sorter 531 may be used to receive H data. The ith sequencer 531 may be further configured to output H sequenced data, where the order of the H sequenced data is higher than the order of the H sequenced data.
It should be appreciated that each sorter 531 can sort the H data received and increase the order of the H data during the sorting process. Thus, the order of the H data is increased once every time the H data passes through one sequencer 531 from the input of the 1 st sequencer 531, so that the H data can be finally sequenced into an ordered sequence through 1 or more sequencers 531. In this way, the sorting circuit 530 can implement a function of sorting data.
In some possible embodiments, the ith sorter 531 may include K comparators. Wherein K is more than or equal to J/2,K and is a positive integer. The kth comparator of the K comparators is configured to receive two data of the H data. The kth comparator is further configured to compare two data in the H data and output the compared two data.
The K comparators are described below with reference to fig. 6 and 7.
Referring to fig. 6, taking the 1 st sequencer 531 as an example, the 1 st sequencer 531 includes 7 comparators (shown in a dotted line box in fig. 6), each of which can receive two data and compare the sizes of the two data, and output the two data after comparison.
When the comparator outputs the two compared data, the following two output modes exist:
mode 3, the two data after comparison are output in order from small to large. For example, assuming that the comparator A1 of the 1 st sequencer 531 receives 8 data from I1 and 7 data from I2, the comparator A1 may exchange the order of the two data and output, that is, send 7 to O1 and 8 to comparator B1.
Mode 4, the two data after comparison are output in order from large to small. For example, assuming that the comparator A1 of the 1 st sorter 531 receives 8 data from I1 and 7 data from I2, the comparator A1 transmits 8 to O1 and 7 to the comparator B1.
It will be appreciated that when the comparators output the two compared data, which output mode is selected is determined by the sorting mode of the sorting circuit 530, for example, when the sorting circuit 530 sorts the data from small to large, each comparator in the sorting circuit 530 operates in the above-described mode 3; when the sorting circuit 530 sorts the data from large to small, each comparator in the sorting circuit 530 operates in the manner 4 described above. Wherein the ordering of ordering circuit 530 may be configured.
Referring to fig. 7, taking k= 4,J =8 as an example, each sequencer 531 may include 4 comparators (A1-A4), each comparator may include two input ends and two output ends, each input end of the comparator may be connected to one register, each output end of the comparator may be connected to one register, and the registers may be used for temporarily storing data. The 8 inputs of the 4 comparators of the i-th sorter 531 are the 8 inputs of the sorter 531, and the 8 outputs of the 4 comparators of the i-th sorter 531 are the 8 outputs of the sorter 531. Wherein, two output ends of A1 of the 1 st sequencer 531 may be connected to 1 input end of A1 and 1 input end of A2 of the 2 nd comparing unit, two output ends of A2 of the 1 st sequencer 531 may be connected to 1 input end of A2 and 1 input end of A3 of the 2 nd comparing unit, two output ends of A3 of the 1 st sequencer 531 may be connected to 1 input end of A3 and 1 input end of A4 of the 2 nd comparing unit, and two output ends of A4 of the 1 st sequencer 531 may be connected to 1 input end of A4 of the 2 nd comparing unit and 1 input end of A1, respectively. Similarly, the connection between the other two adjacent sequencers 531 may refer to the connection between the 1 st sequencer 531 and the 2 nd sequencer 531. Of course, the connection between the comparators in the sorter 531 is not limited to the connection through the register shown in fig. 7, and may be directly connected.
In order to sort 8 data (i.e., h=8) using the sorting circuit 530 shown in fig. 7, it is assumed that the sorting circuit 530 sorts the data to be sorted in order from small to large, and first, the data to be sorted input to the 1 st sorter 531 is: "8, 7, 6, 5, 4, 3, 2, 1", the 1 st sorter 531 sorts the sequence, and the sorted sequence becomes: "7, 8, 5, 6, 3, 4, 1, 2". Then, the 2 nd sorter 531 sorts the sequence output from the 1 st sorter 531, and the sorted sequence becomes: "2, 5, 8, 3, 6, 1, 4, 7". By analogy, the 7 th sequencer 531 outputs the sequence: "1, 2, 3, 4, 5, 6, 7, 8".
Based on the sorting process shown in fig. 7, the order of the data to be sorted is improved every time the data passes through K comparators of one sorter 531, so that an implementation of the sorter 531 can be provided, so that the sorter 531 can improve the order of H data.
Alternatively, the K comparators may include: j/2 first comparators and (J/2) -1 second comparators. J-2 output ends of the J/2 first comparators are respectively connected with J-2 input ends of the (J/2) -1 second comparators.
The J input ends of the J/2 first comparators are as follows: the J inputs of the ith sorter 531. The other 2 outputs of the J/2 first comparators and the J-2 outputs of the (J/2) -1 second comparators are: the J outputs of the ith sorter 531.
The implementation and ordering process of the above-described K comparators are described below in connection with fig. 6.
Referring to fig. 6, each sequencer 531 includes 7 comparators including: 4 first comparators (A1-A4) and 3 second comparators (B1-B3), each comprising two inputs and two outputs.
Taking 7 comparators in the 1 st second comparator as an example, 8 input ends of A1-A4 are 8 input ends of the sorter 531, 6 output ends of A1-A4 are respectively connected with 6 input ends of B1-B3, and the other 2 output ends of A1-A4 and 6 output ends of B1-B3 are 8 output ends of the 1 st sorter 531.
Further, two input ends of each comparator may be respectively connected to one register, and two output ends of each comparator may be respectively connected to one register, where the registers may be used for temporarily storing data. For example, referring to the comparator A1 in the 1 st second comparator of fig. 6, two inputs (I1, I2) of the comparator A1 are respectively connected to one register (shown in fig. 6 with a dashed box), and two outputs of the comparator A1 are respectively connected to one register.
The sorting circuit 530 is assumed to sort 8 data in order from small to large (i.e., h=8). First, the data to be sorted input to the sorting circuit 530 is: "8, 7, 6, 5, 4, 3, 2, 1", the 1 st sorter 531 sorts the sequences, and the output sequence becomes: "7, 5, 8, 3, 6, 1, 4, 2". Then, the 2 nd sorter 531 sorts the sequence output from the 1 st sorter 531, and the sorted sequence becomes: "5, 3, 7, 1, 8, 2, 6, 4". By analogy, the 4 th sequencer 531 outputs the sequence: "1, 2, 3, 4, 5, 6, 7, 8", thereby realizing the function of ordering data.
It should be understood that, based on the implementation of K comparators and the description of the sorting process in fig. 6, the order of the data to be sorted is improved every time the data passes through K comparators of one sorter 531, so that an implementation of the sorter 531 may be provided, so that the sorter 531 can improve the order of H data.
Optionally, the number of times the I-sequencers 531 sequence the H data is greater than or equal to J/2.
When the number of times of sorting the H data by the I-sorters 531 is greater than or equal to J/2, the sorting circuit 530 can also sort the data to be sorted with the lowest degree of order into an ordered sequence. For example, referring again to the sorting process shown in fig. 6, the data to be sorted of the sorting circuit 530 is input: "8, 7, 6, 5, 4, 3, 2, 1", the order of which is the lowest order sequence among the sequences of length 8, which can be just arranged into an ordered sequence after being ordered by the ordering circuit 530 shown in fig. 6. In other words, when the number of times the I-sequencers 531 sort the H data is greater than or equal to J/2, the sorting circuit 530 may sort the H data into an ordered sequence.
Two embodiments of the "I ranker 531 ranking H data more than or equal to J/2" are described below.
Mode 5, I and J satisfy the following relationship: I.gtoreq.J/2, in other words, the number of sequencers 531 is greater than or equal to J/2, so that the number of times of sequencing H data by the I sequencers 531 is greater than or equal to J/2 can be ensured, and the sequencing circuit 530 can sequence the H data into an ordered sequence.
In mode 6, J input terminals of the 1 st sequencer 531 among the i sequencers 531 are connected to the output terminals of the J selectors, respectively. The first input ends of the J selectors are J input ends of the sorting circuit 530, and J output ends of an I-th sorter 531 in the I-th sorter 531 are respectively connected with the second input ends of the J selectors.
In this way, after the data to be sorted is output from the J output ends of the I-th sorter 531, the data to be sorted may be input into the sorting circuit 530 described in the second aspect through the second input ends of the J selectors again, that is, the data to be sorted may be circularly sorted in the sorting circuit 530 described in the second aspect, so, under the condition that I is less than J/2, the number of times the data to be sorted is circularly sorted may be controlled, so that the number of times the H data are sorted by the I-th sorter 531 is greater than or equal to J/2, thereby reducing the hardware scale and saving the cost.
One implementation of mode 6 is described below.
Referring to fig. 6 again, the 8 input terminals (I1-I8) of the 1 st sequencer 531 are respectively connected to the output terminals of 8 selectors (MUX), and the first input terminals of the 8 selectors (shown in fig. 6 with a dashed box) are respectively: second inputs of Din1, din2, …, din8, and 8 selectors are respectively: loop1, loop2, …, loop8. The 8 output ends (O1-O8) of the 4 th sequencer 531 are respectively connected with loop1, loop2, … and loop8. The data to be sorted may be input to the sorting circuit 530 from Din1, din2, …, din8, that is, the 8 input terminals of the sorting circuit 530 are Din1, din2, …, din8. In addition, the selector may be a two-way selector.
Each of these 8 selectors may be connected to a counter (shown in dashed boxes in fig. 6) that may control which input of the selector is gated. For example, when the counter controls the first input terminal of each selector to be gated, the data to be sorted may be input into the sorting circuit 530 from Din1-Din8, and sorted by the sorting circuit 530, when the counter controls the second input terminal of each selector to be gated, the data output by the sorting circuit 530 is input into the sorting circuit 530 through loop1-loop8, and the sorting is continued by the sorting circuit 530.
Therefore, the counter can control the number of times the data to be sorted is circularly sorted in the sorting circuit 530, so that the number of times the data to be sorted is circularly sorted in the sorting circuit 530 can be increased by controlling the number of times the data to be sorted is circularly sorted in the sorting circuit 530, even if the number of times of the sorting of the data to be sorted by the I sorters 531 is smaller than J/2, the number of times of the sorting of the data to be sorted by the I sorters 531 is larger than or equal to J/2, and further the hardware scale can be reduced, and the cost is saved.
Of course, in implementing the above-described mode 6, J output terminals of the I-th sorter 531 among the I-th sorters 531 may be connected to the second input terminals of the J selectors, respectively.
It should be noted that, the sorting circuit 530 may also be referred to as an initial sorting (initport) circuit, and when the apparatus is implemented, a micro-architecture implementation may be adopted.
In addition, the embodiment of the structure of the comparators in the sorting circuit 530 shown in fig. 6 is not limited to that shown in fig. 6, and in practical applications, the number of comparators in each sorter 531 may be increased or decreased, or the connection manner between the comparators in each sorter 531 may be adjusted. For example, a structural embodiment of the comparator may also be as shown in fig. 8.
Referring to fig. 8, each sequencer 531 includes 7 comparators, including: 3 first comparators (A1-A3) and 4 second comparators (B1-B4), each comprising two inputs and two outputs.
Taking 7 comparators in the 1 st second comparator as an example, 8 output ends of B1-B4 are 8 output ends of the sorter 531, 6 input ends of B1-B4 are respectively connected with 6 output ends of A1-A3, and the other 2 input ends of B1-B4 and 6 input ends of A1-A3 are 8 input ends of the 1 st sorter 531.
Further, two input ends of each comparator may be respectively connected to one register, and two output ends of each comparator may be respectively connected to one register, where the registers may be used for temporarily storing data. For example, referring to the comparator A1 in the 1 st second comparator of fig. 8, two inputs (I2, I3) of the comparator A1 are respectively connected to one register (shown in fig. 8 with a dashed box), and two outputs of the comparator A1 are respectively connected to one register.
The sorting circuit 530 is assumed to sort the data in order from small to large. First, the data to be sorted input to the sorting circuit 530 is: "8, 7, 6, 5, 4, 3, 2, 1", the 1 st sorter 531 sorts the sequences, and the output sequence becomes: "6, 8, 4, 7, 2, 5, 1, 3". Then, the 2 nd sorter 531 sorts the sequence output from the 1 st sorter 531, and the sorted sequence becomes: "4, 6, 2, 8, 1, 7, 3, 5". By analogy, the 4 th sequencer 531 outputs the sequence: "1, 2, 3, 4, 5, 6, 7, 8", thereby realizing the function of ordering data.
It should be appreciated that the sorting circuit shown in fig. 8 is different from the sorting circuit shown in fig. 6, but can achieve the same effect, i.e. an implementation of the sorter 531 can be provided, so that the sorter 531 can improve the order of J data.
In a possible embodiment, the sorting circuit 530 may further sort the H data into 1 ordered sequence in a double-tone sorting manner. Specifically, the ith sorter 531 described above may include K comparators. Wherein K is more than or equal to J/2,K and is a positive integer. And two input ends of each comparator are respectively connected with the output end of one H selection 1 selector. The H select 1 selector means that 1 data out of H data can be selected and output. With the H1 selector, the comparator can select 2 data comparisons from the H data according to a certain rule and output.
Referring to fig. 9, fig. 9 is a schematic diagram of a sorting circuit according to another embodiment of the present application. Taking k= 4,J = 8,H =8 as an example, each sequencer 531 may include 4 comparators (A1-A4), each comparator may include two inputs and two outputs, each input of the comparator may be connected to an output of an H1 selector, each output of the comparator may be connected to a register, and the register may be used to temporarily store data. The 8 inputs of the 4 comparators of the i-th sorter 531 are the 8 inputs of the sorter 531, and the 8 outputs of the 4 comparators of the i-th sorter 531 are the 8 outputs of the sorter 531. The 1 st sorter 531 may compare and output 2 data selected from the 8 data to be sorted according to a certain rule, and the 1 st sorter 531 may compare and output 2 data selected from the 8 data to be sorted according to a certain rule. In other words, the 8H 1 selectors of the 1 st sorter 531 may input 8 data to be sorted into 4 comparators for sorting according to a certain rule. And so on, the 8H 1 selectors of the (i+1) th sorter 531 may input the 8 data output by the (i) th sorter 531 to the 4 comparators for sorting according to a certain rule.
When the rule is an ordering rule corresponding to a double-tone ordering algorithm, the ordering circuit 530 may arrange the H data into 1 ordered sequence in a double-tone ordering manner. It will be appreciated that the ordering efficiency of the double-tone ordering is higher than the ordering process shown in fig. 6 or fig. 7, so that the ordering efficiency of the ordering circuit 530 can be further improved.
Based on the processor 500 shown in fig. 5, the ordering circuit 531 in the processor 500 may increase the order of the H data I times in response to the decoded second instruction, so that the ordering circuit 531 may order the H data into an ordered sequence by ordering the H data one or more times. In other words, the processor 500 executes the second instruction to sort the H data, so that the processor 500 can execute fewer instructions, i.e. can complete the sorting operation of the H data, so that a large number of repeated instructions can be avoided to reduce the number of sorting instructions to be executed and the execution time, thereby improving the sorting efficiency.
The data structure of each of the above H data may also be: { value (score), index (index) }. Wherein "numerical value" may be implemented using any of the following: a 16-bit floating-point number (FP), a 32-bit floating-point number, an 8-bit Integer (INT), a 16-bit integer, or a 32-bit integer, etc. The "index" corresponds to the "numerical value", and includes the address of various information of the "numerical value", and can be implemented by using a pointer, that is, various information of the corresponding "numerical value" can be acquired through the "index". The comparators described above may compare the sizes of the data based on the "numerical value" of each data when ordering such data. In this way, the sorting circuit 530 can process data with wider bit number, and no redundant data exists in the sorting process, so that the bus utilization rate and the sorting performance are improved.
Illustratively, FIG. 10 is a schematic diagram of sorting H data using the sorting circuit 530 of FIG. 6. Referring to fig. 10, h data includes: { 0, index 0}, { 1, index 1}, …, { H-1, index H-1}, the sorting circuit 530 (initport) may output an ordered sequence for the H data: { value 0', index 0' }, { value 1', index 1' }, …, { value (H-1) ', index (H-1)' }.
In addition, the sorting circuit 530 may sort data of different data types. Assuming that the data types that the sorting circuit 530 can sort are 32-bit floating point numbers, when other types of data (such as 8-bit integer, 16-bit integer, etc.) exist in the data to be sorted, the other types of data can be uniformly converted into the 32-bit floating point numbers through data type conversion, so that the purpose that the sorting circuit 530 can sort the data of different data types is achieved. The data type conversion between different data types may refer to the existing specification, and will not be described herein.
The sorting circuit 530 may also implement the sorting of the X (X > J) data into a plurality of ordered sequences by multiple rounds of sorting, for example, byWheel sequencing, output The sequence of the order,representing rounding up to X/J. When the data input to the sorting circuit 530 is Z (Z < J), the sorting of the Z data may be achieved by filling in the data, for example, when the sorting circuit 530 sorts the data from small to large, the J-Z infinity numbers (infinity numbers are larger than any input data and may be achieved by predefining) may be input when the Z data is input to the sorting circuit 530, so that the data input to the sorting circuit 530 is J, and the first Z data in the sequence output by the sorting circuit 530 is the Z data after sorting.
The two processors provided by the embodiments of the present application are described in detail above in connection with fig. 1-10. The following describes an electronic device provided in an embodiment of the present application with reference to fig. 11.
Embodiments of the present application provide an electronic device that may include one or more of the above-described processors 100 and/or one or more of the above-described processors 500. Wherein the electronic device may include, but is not limited to: servers, computers, mobile phones (mobile phones), tablet computers (Pad), computers with wireless transceiving functions, virtual Reality (VR) terminal equipment, augmented reality (augmented reality, AR) terminal equipment, wireless terminals in industrial control (industrial control), wireless terminals in unmanned driving (self driving), wireless terminals in remote medical (remote medical), wireless terminals in smart grid (smart grid), wireless terminals in transportation safety (transportation safety), wireless terminals in smart city (smart city), wireless terminals in smart home (smart home), vehicle terminals, RSUs with terminal functions, and the like.
Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 11, the electronic device 1100 may include a processor 1101. Optionally, the electronic device 1100 may also include memory 1102 and/or a transceiver 1103. The processor 1101 is coupled to the memory 1102 and the transceiver 1103, as may be connected by a communication bus.
The following describes the various constituent elements of the electronic device 1100 in detail with reference to fig. 11:
the processor 1101 is a control center of the electronic device 1100, and may be one processor or a collective term of a plurality of processing elements. For example, the processor 1101 is one or more central processing units (central processing unit, CPU), one or more vector processors (vector processor), coprocessors (coprocessor) or the like, one or more specific integrated circuits (application specific integrated circuit, ASIC), or one or more integrated circuits configured to implement embodiments of the application, such as, for example: one or more microprocessors (digital signal processor, DSPs), or one or more field programmable gate arrays (field programmable gate array, FPGAs).
Alternatively, the processor 1101 may perform various functions of the electronic device 1100 by running or executing software programs stored in the memory 1102 and invoking data stored in the memory 1102.
In a particular implementation, the processor 1101 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 11, as an embodiment.
In a particular implementation, the electronic device 1100 may also include multiple processors, such as the processor 1101 and processor 1104 shown in FIG. 11, as an embodiment. Each of these processors may be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
The memory 1102 is configured to store a software program for executing the solution of the present application, and the processor 1101 controls the execution of the software program, and the specific implementation manner may refer to the method embodiments shown in fig. 12 to 15, respectively, which are not described herein again.
Alternatively, memory 1102 may be, but is not limited to, read-only memory (ROM) or other type of static storage device that can store static information and instructions, random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, but may also be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), compact disc read-only memory (compact disc read-only memory) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 1102 may be integral to the processor 1101 or may exist separately and be coupled to the processor 1101 by an interface circuit (not shown in fig. 11) of the electronic device 1100, as the embodiment of the present application is not limited in detail.
A transceiver 1103 for communication with other electronic devices. For example, electronic device 1100 is a terminal device and transceiver 1103 may be used to communicate with a network device or with another terminal device. As another example, electronic device 1100 is a network device and transceiver 1103 can be used to communicate with a terminal device or with another network device.
Alternatively, the transceiver 1103 may include a receiver and a transmitter (not separately shown in fig. 11). The receiver is used for realizing the receiving function, and the transmitter is used for realizing the transmitting function.
Alternatively, transceiver 1103 may be integrated with processor 1101, or may exist separately, and be coupled to processor 1101 by an interface circuit (not shown in fig. 11) of electronic device 1100, as embodiments of the present application are not specifically limited.
It should be noted that the structure of the electronic device 1100 shown in fig. 11 is not limited to the electronic device, and an actual electronic device may include more or fewer components than shown, or may combine some components, or may be different in arrangement of components.
In addition, the technical effects of the electronic device 1100 may refer to the technical effects of the methods shown in fig. 12-15, which are not described herein.
The embodiment shown in fig. 1 above illustrates a processor 100 provided by the present application, and a sorting method for sorting data using the processor 100 is described below with reference to fig. 12.
Fig. 12 is a flowchart of a sorting method according to an embodiment of the present application. Referring to fig. 12, the method can be applied to the processor 100 shown in fig. 1, and the method includes the following steps:
s1201, the control circuit reads the first instruction from the instruction storage circuit, and decodes the first instruction.
S1202, the control circuit sends the decoded first instruction to the sequencing circuit.
S1203, the sorting circuit responds to the decoded first instruction and reads the M first sequences according to the memory address.
In one possible design, the sorting circuit 130 may include: the device comprises a sequencing controller 230, a sequencer 210 and M first buffers 220, wherein the sequencing controller 230 is coupled with the M first buffers 220, and the M first buffers 220 are coupled with the sequencer 210. In S1203, the sorting circuit 130 responds to the decoded first instruction and reads M first sequences according to the memory address, and may include: the sorting controller 230 reads the M first sequences according to the storage addresses and stores the M first sequences in the M first buffers 220. Wherein each first buffer 220 stores one first sequence.
Illustratively, assuming that the M first sequences are stored in global memory, then the ordering controller 230 may send a read instruction to global memory according to the memory address and via the bus in response to the decoded first instructions, the read instruction being used to instruct the global memory to send the M first sequences to the ordering controller 230. The ordering controller 230 may then receive the M first sequences from the global memory and send one first sequence to each first buffer 220. Wherein, the M first sequences may be all ordered sequences, or may exist as unordered sequences.
In addition, the specific embodiment and effect of S1203 may refer to the description of the movement data of the sorting controller 230 in fig. 2, which is not repeated here.
And S1204, if the M first sequences are all ordered sequences, outputting an ordered sequence with the length of M by the ordering circuit.
In one possible design, S1204 may include: if the M first sequences are all ordered sequences, the sequencer 210 reads an ordered sequence of length m×n from the M first buffers 220.
Alternatively, if the M first sequences are all ordered sequences, the sequencer 210 reads an ordered sequence with a length of m×n from the M first buffers 220 according to the first reading rule. Wherein, the first reading rule may be: the first-aligned data is read out in M first buffers 220 at a time, that is, one first-aligned data is read out in M ordered sequences at a time. For example, assume that there are two ordered sequences, respectively: "3, 5, 6, 9", "7, 13, 25, 26", and the order rule of the ordered sequences is from small to large, then the data of the first of the two ordered sequences is 3. When the first buffers 220 are implemented as input buffers, the sequencer 210 may read out one first-aligned data at a time at the output terminals of the M first buffers 220. It should be understood that the above S1204 may implement a function of arranging M ordered sequences into one ordered sequence.
In addition, the specific embodiment and effect of S1204 can refer to the above-mentioned mode 1, and will not be repeated here.
S1205, if the unordered sequences exist in the M first sequences, the ordering circuit outputs M ordered sequences with the length of N.
In one possible design, S1205 may include: if there is an unordered sequence in the M first sequences, the sequencer 210 reads M ordered sequences of length N from the M first buffers 220.
Alternatively, if there is an unordered sequence in the M first sequences, the sequencer 210 reads M ordered sequences with length N from the M first buffers 220 according to the second reading rule. The second reading rule may be: an ordered sequence of length N is read out in M first buffers 220 at a time. Specifically, the sequencer 210 may read x data from each first buffer 220 at a time, and then arrange the m×x data into an ordered sequence, where x is a positive integer. For example, let x=1, m=4, and the first sequences stored in the 4 first buffers 220 are respectively: "10, 8, 20, 3", "1, 25, 33, 7", "6, 4, 16, 23", "11, 15, 16, 18", then sorter 210 may read 1 data in the 4 first buffers 220, respectively, for the first time, i.e., read "10, 1, 6, 11", and then arrange the 4 data into an ordered sequence: "1, 6, 10, 11". Alternatively, when the first buffers 220 are implemented using input buffers, the sequencer 210 may read x data at the output of each first buffer 220 at a time. It will be appreciated that, since x may be an integer greater than or equal to 1, when the value of x becomes larger, the sequencer 210 may read more data from the M first buffers 220 at a time for sequencing, thereby improving the sequencing efficiency. It should be understood that the above S1205 may implement a function of arranging unordered M sequences into M ordered sequences.
In addition, the specific implementation and effect of S1205 can refer to the above-mentioned mode 2, and will not be repeated here.
The above-described modes 1 and 2 may be implemented independently or in combination. When embodiment 1 is combined with embodiment 2, the method may include: first, the method 2 is performed to arrange M unordered sequences into M ordered sequences. Then, the method 1 is performed to arrange the M ordered sequences into one ordered sequence.
Optionally, the sorting circuit 130 further comprises a second buffer 240, the second buffer 240 being coupled to the sorting controller 230. In order to implement the function of temporarily storing data in the second buffer 240, the method may further include:
in step 1, the sorting controller 230 reads the M first sequences according to the storage address, and stores the M first sequences in the second buffer 240.
In step 2, the sorting controller 230 moves the M first sequences from the second buffer 240 to the M first buffers 220.
It can be appreciated that for the specific embodiment and effect of the second register 240 for buffering data, reference is made to the description of how the sorting circuit 130 shown in fig. 2 realizes the function of buffering data in the second register 240, and the description is omitted herein.
Further, the sorting controller 230 includes a first read-write controller 231 and a second read-write controller 232, the first read-write controller 231 is coupled to the second read-write controller 232, the second buffer 240 is coupled to the first read-write controller 231 and the second read-write controller 232, and the M first buffers 220 are coupled to the first read-write controller 231. For the sake of the sorting efficiency, the step 1 may include: the second read/write controller 232 reads the M first sequences according to the storage address and stores the M first sequences in the second buffer 240. The step 2 may include: the first read/write controller moves the M first sequences from the second buffer 240 to the M first buffers 220. It can be appreciated that for the specific embodiments and effects of step 1 and step 2, reference may be made to the description of how the sorting circuit 130 shown in fig. 2 realizes the function of continuously forwarding the data to be sorted, which is not repeated here.
Still further, in order to improve the sorting efficiency, the method may further include:
in step 3, when M first sequences are shifted, the first read-write controller 231 transmits the shifted data amount information to the second read-write controller 232.
It can be appreciated that for the specific embodiment and effect of step 3, reference may be made to the description of the sorting circuit 130 shown in fig. 2 related to the transmission of the moved data amount information from the first read-write controller 231 to the second read-write controller 232, which is not repeated herein.
Further, the sorting controller 230 may further include a third read-write controller 233, where the third read-write controller 233 is coupled to the sorter 210 and the second buffer 240, respectively. In order to implement the function of writing out ordered data by the third read-write controller 233, the method may further include:
in step 4, the third read/write controller 233 moves the ordered sequence output from the sequencer 210 to the second buffer 240.
It can be appreciated that for the specific embodiment and effect of step 4, reference may be made to the description of the sorting circuit 130 shown in fig. 2 for implementing the function of writing out the ordered data, which is not repeated here.
The embodiments shown in fig. 5-9 above illustrate the structure of another processor 500 provided by the present application, and a data sorting method implemented based on the other processor 500 is described below with reference to fig. 13.
Fig. 13 is a second flowchart of a sorting method according to an embodiment of the present application. Referring to fig. 13, the method can be applied to the processor 500 shown in any one of the implementations of fig. 5-9, and the method includes the following steps:
s1301, the control circuit reads the second instruction from the instruction storage circuit, and decodes the second instruction.
S1302, the control circuit sends the decoded second instruction to the sequencing circuit.
S1303, the sorting circuit responds to the decoded second instruction, and reads H data according to the memory address.
S1304, the sorting circuit sorts the H data using an i-th sorter in the sorting circuit.
It can be appreciated that the specific embodiments and effects of S1301-S1304 can refer to the use process of the processor 500 shown in any one of the above implementations of fig. 5-9, which is not described herein.
In some possible embodiments, if the ith sorter 531 includes K comparators, K is greater than or equal to J/2, and K is a positive integer, in S1304, the sorting circuit 530 sorts the H data by the ith sorter 531 in the sorting circuit 530, which may include:
the kth comparator of the K comparators receives two data of the H data. The kth comparator compares two data of the H data and outputs the compared two data. For specific embodiments and effects, reference may be made to the description of the processor 500 related to the K comparators shown in any of the above implementations of fig. 5 to 9, which is not repeated here.
The above figures 1-13 illustrate two processor embodiments, respectively. On the basis of the two processors shown in fig. 1-13, an embodiment of the present application provides yet another ordering method to order data in combination with the two processors.
Fig. 14 is a flowchart illustrating a sorting method according to an embodiment of the present application. Referring to fig. 14, the method may be applied to an electronic device including a processor including the sorting circuit 130 shown in fig. 1 (hereinafter, simply referred to as a first sorting circuit) and the sorting circuit 530 shown in any one of the implementations of fig. 5-9 (hereinafter, simply referred to as a second sorting circuit). The method comprises the following steps:
s1401, based on the second sorting circuit, the data to be sorted is arranged into a plurality of ordered sequences.
For example, assuming 512 data to be sorted, the second sorting circuit may sort 32 data at a time (i.e., j=32), then the 512 data may be divided into 16 groups of 32 data each, and then sort each group of data using the second sorting circuit, outputting 16 ordered sequences of length 32.
S1402, based on the first sorting circuit, arranges the plurality of ordered sequences into one ordered sequence.
For example, assuming that there are 16 ordered sequences of length 32, and the first ordering circuit can arrange the 8 ordered sequences into one ordered sequence at a time (i.e., m=8), the 16 ordered sequences can be divided into 2 groups of 8 ordered sequences each, then the first ordering circuit is used to order the 8 ordered sequences of each group, output 2 ordered sequences of length 256, and finally the first ordering circuit is used to order the 2 ordered sequences of length 256, output 1 ordered sequence of length 512.
The specific implementation process of the second sorting circuit for arranging the plurality of data to be sorted into one ordered sequence may refer to the sorting method shown in fig. 13, and the specific implementation process of the first sorting circuit for arranging the M ordered sequences into one ordered sequence may refer to the sorting method shown in fig. 12, which is not repeated herein.
The above figures 1-14 illustrate two processor embodiments, respectively. On the basis of the two processors shown in fig. 1-14, an embodiment of the present application provides a further ordering method to order data to be ordered by using a plurality of processing cores of the processors, so as to further improve the ordering efficiency.
Referring to fig. 15, the method may be applied to an electronic device including a processor including a plurality of processing cores, each of the processing cores including: the sorting circuit 130 shown in fig. 1 and 2 and the sorting circuit 530 shown in any one of the implementations of fig. 5-9 described above, or each processing core includes: the sequencing circuit 130 shown in fig. 1 and 2 includes the following steps:
S1501, the control circuit reads the second instruction from the instruction storage circuit, and decodes the second instruction. Wherein the decoded second instruction comprises a memory address of the data to be ordered.
S1502, the control circuit sends the decoded second instruction to the plurality of processing cores.
In S1503, the plurality of processing cores responds to the decoded second instruction and reads the data to be ordered according to the memory address.
S1504, the plurality of processing cores sort the data to be sorted.
The data to be sorted may include a plurality of sequences, and the sorting circuit is configured to read M sequences of the plurality of sequences, and output an ordered sequence with a length of m×n if the M sequences are all ordered sequences, or output M ordered sequences with a length of N if there is an unordered sequence in the M sequences. The specific implementation process may refer to the process of sorting the data by the sorting circuit 130 shown in fig. 2, which is not described herein.
In one possible design, the sorting of the data to be sorted by the plurality of processing cores may include: the plurality of processing cores iteratively sort the data to be sorted until the data to be sorted is arranged into 1 ordered sequence. In each iteration, the multiple processing cores are used for arranging N ordered sequences output by the last iteration into And (3) an ordered sequence, wherein M is an integer greater than 1.
Optionally, the plurality of processing cores iteratively ranks the data to be ranked until the data to be ranked is ranked into 1 ordered sequence, which may include: the number E of activations of the plurality of processing cores is determined. Wherein E is an integer greater than 1. The data to be ordered is arranged into 1 ordered sequence based on the E processing cores.
For example, the number of starts E of the plurality of processors is determined based on the data amount (denoted Q) of the data to be sorted, the number of the plurality of processing cores (denoted P), and the minimum processing data amount (denoted T) of each processing core. For example, when Q/P is greater than or equal to T, e=p; when Q/P is less than T,wherein,to round up the symbol, for example,representing Q/T upward fetchAnd (3) finishing.
Alternatively, based on the E processing cores, arranging the data to be ordered into 1 ordered sequence may include the following steps:
and 5, distributing the data to be ordered to each of the E processing cores.
Optionally, the data to be sorted is distributed equally to each of the E processing cores. For example, assuming that the data amount of the data to be sorted to which each processing core is assigned is F, f=q/P when Q/P is greater than or equal to T; when Q/P is less than T, f=q/E.
And 6, each processing core arranges the allocated data to be ordered into 1 ordered sequence.
Wherein the E processing cores arrange the allocated data to be ordered into E ordered sequences. When each processing core includes the sorting circuit 130 and the sorting circuit 530, the specific implementation process of each processing core for arranging the data to be sorted into 1 ordered sequence may refer to the sorting method shown in fig. 14; when each processing core includes the sorting circuit 130 shown in fig. 1, the specific implementation process of each processing core for arranging the data to be sorted into 1 ordered sequence may be implemented in combination with the above-mentioned mode 1 and mode 2, so as to arrange M ordered sequences into one ordered sequence, which is not described herein again.
Step 7, based on the plurality of processing coresA processing core for arranging E ordered sequences intoAnd (3) an ordered sequence. Wherein M is the number of the first buffers 220 in the sorting circuit 130 shown in fig. 2.
Step 8, ifWill beE is determined, and the step 7 is executed in a return mode; otherwise, determining that the sorting is completed, and outputting an ordered sequence corresponding to the data to be sorted.
For example, assuming e=18, m=4, then in the first round of ordering, based on And a processing core for arranging the 18 ordered sequences into 5 ordered sequences. Wherein the 4 processing cores may respectively arrange 4 ordered sequences of the 18 ordered sequences into 1 ordered sequence, and the 1 processing core may arrange the remaining 2 ordered sequences of the 18 ordered sequences into 1 ordered sequence. Then, since 5 > 1, 5 is determined as E, and the second round of sorting is entered, i.e., based onAnd the processing core is used for arranging 5 ordered sequences ordered in the previous round into 2 ordered sequences. Thereafter, since 2 > 1, 2 is determined to be E, a third round of ordering is entered, i.e., based onAnd the processing core is used for arranging 2 ordered sequences ordered in the previous round into 1 ordered sequence. Finally, due toAnd thus determining that the sorting is completed, and outputting 1 ordered sequence corresponding to the data to be sorted. In this way, the number of sequencing circuits participating in sequencing can be gradually reduced in the sequencing process of the data to be sequenced, and occupied processing resources are reduced.
It should be appreciated that, if the amount of data to be sorted is too large (e.g., more than 100 tens of thousands of data), the sorting method shown in fig. 15 can sort multiple processing cores simultaneously, so as to improve the sorting efficiency.
In addition, when the method shown in fig. 15 is executed, if the data to be sorted needs to be subjected to TOPK sorting, each processing core may sort the allocated data to be sorted into 1 ordered sequence by the TOPK sorting. The implementation process of the processing core for performing the TOPK sorting may refer to the description of the sorting circuit 130 shown in FIG. 2 and will not be repeated here.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (15)

  1. A processor, comprising: an instruction storage circuit, a control circuit and a sequencing circuit, wherein the instruction storage circuit is coupled with the control circuit, and the control circuit is coupled with the sequencing circuit;
    the control circuit is used for reading a first instruction from the instruction storage circuit and decoding the first instruction; the decoded first instruction comprises M storage addresses of first sequences, the length of each first sequence is N, M is an integer greater than 1, and N is an integer greater than 1;
    the control circuit is further configured to send the decoded first instruction to the sequencing circuit;
    the sequencing circuit is configured to respond to the decoded first instruction by performing the following steps: and reading M first sequences according to the storage address, and outputting an ordered sequence with the length of M x N if the M first sequences are all ordered sequences, or outputting M ordered sequences with the length of N if unordered sequences exist in the M first sequences.
  2. The processor of claim 1, wherein the sequencing circuit comprises: the device comprises a sequencing controller, a sequencer and M first buffers, wherein the sequencing controller is coupled with the M first buffers, and the M first buffers are coupled with the sequencer;
    the sequencing controller is used for reading M first sequences according to the storage address and storing the M first sequences in M first buffers; each first buffer stores a first sequence;
    and the sequencer is used for reading out one ordered sequence with the length of M from the M first buffers if the M first sequences are all ordered sequences, or reading out M ordered sequences with the length of N from the M first buffers if the M first sequences are unordered sequences.
  3. The processor of claim 2, wherein the processor further comprises a processor controller,
    the sequencer is further configured to read an ordered sequence with a length of m×n from the M first buffers according to a first reading rule if the M first sequences are all ordered sequences; the first reading rule is: reading out first data in the M first buffers each time; or,
    The sequencer is further configured to read M ordered sequences with a length of N from the M first buffers according to a second reading rule if there are unordered sequences in the M first sequences; the second reading rule is: an ordered sequence of length N is read out in the M first buffers each time.
  4. A processor according to claim 2 or 3, wherein the sequencing circuit further comprises: a second buffer coupled to the sequencing controller;
    the sequencing controller is used for reading M first sequences according to the storage address and storing the M first sequences in the second buffer;
    the ordering controller is further configured to move the M first sequences from the second buffer to the M first buffers.
  5. The processor of claim 4, wherein the sequencing controller comprises a first read-write controller and a second read-write controller, the first read-write controller coupled to the second read-write controller, the second buffer coupled to the first read-write controller and the second read-write controller, respectively, the M first buffers each coupled to the first read-write controller;
    The second read-write controller is configured to read M first sequences according to the storage address, and store the M first sequences in the second buffer;
    the first read-write controller is used for moving M first sequences from the second buffer memory to M first buffer memories.
  6. The processor of claim 5, wherein the processor further comprises,
    the first read-write controller is further configured to send information of the moved data amount to the second read-write controller when M first sequences are moved.
  7. The processor of any one of claims 4-6, wherein the sequencing controller further comprises a third read-write controller coupled to the sequencer and the second buffer, respectively;
    and the third read-write controller is used for moving the ordered sequence output by the sequencer to the second buffer.
  8. A method of ordering applied to a processor, the processor comprising: an instruction storage circuit, a control circuit and a sequencing circuit, wherein the instruction storage circuit is coupled with the control circuit, and the control circuit is coupled with the sequencing circuit;
    The method comprises the following steps:
    the control circuit reads a first instruction from the instruction storage circuit and decodes the first instruction; the decoded first instruction comprises M storage addresses of first sequences, the length of each first sequence is N, M is an integer greater than 1, and N is an integer greater than 1;
    the control circuit sending the decoded first instruction to the ordering circuit;
    the sequencing circuit responds to the decoded first instruction, reads M first sequences according to the storage address, and outputs an ordered sequence with the length of M times N if the M first sequences are all ordered sequences, or outputs M ordered sequences with the length of N if unordered sequences exist in the M first sequences.
  9. The method of claim 8, wherein the sequencing circuit comprises: the device comprises a sequencing controller, a sequencer and M first buffers, wherein the sequencing controller is coupled with the M first buffers, and the M first buffers are coupled with the sequencer;
    the sorting circuit responds to the decoded first instruction, reads M first sequences according to the storage address, and outputs an ordered sequence with length of m×n if the M first sequences are all ordered sequences, or outputs M ordered sequences with length of N if there is an unordered sequence in the M first sequences, including:
    The sequencing controller reads M first sequences according to the storage address and stores the M first sequences in M first buffers; each first buffer stores a first sequence;
    if the M first sequences are all ordered sequences, the sequencer reads out an ordered sequence with the length of M x N from the M first buffers; or,
    and if the disordered sequences exist in the M first sequences, the sequencer reads out M ordered sequences with the length of N from the M first buffers.
  10. The method of claim 9, wherein if the M first sequences are all ordered sequences, the sequencer reads an ordered sequence of length M x N from the M first buffers, comprising:
    if the M first sequences are all ordered sequences, the sequencer reads an ordered sequence with the length of M x N from the M first buffers according to a first reading rule; the first reading rule is: reading out first data in the M first buffers each time;
    and if the M first sequences have unordered sequences, the sequencer reads M ordered sequences with length of N from the M first buffers, including:
    If the M first sequences have unordered sequences, the sequencer reads M ordered sequences with the length of N from the M first buffers according to a second reading rule; the second reading rule is: an ordered sequence of length N is read out in the M first buffers each time.
  11. The method of claim 9 or 10, wherein the sequencing circuit further comprises: a second buffer coupled to the sequencing controller;
    the sorting controller reads the M first sequences according to the storage addresses, and stores the M first sequences in the M first buffers, including:
    the sequencing controller reads M first sequences according to the storage address and stores the M first sequences in the second buffer;
    the ordering controller moves the M first sequences from the second buffer to the M first buffers.
  12. The method of claim 11, wherein the sequencing controller comprises a first read-write controller and a second read-write controller, the first read-write controller coupled to the second read-write controller, the second buffer coupled to the first read-write controller and the second read-write controller, respectively, the M first buffers each coupled to the first read-write controller;
    The sorting controller reads the M first sequences according to the storage address, and stores the M first sequences in the second buffer, including:
    the second read-write controller reads M first sequences according to the storage address and stores the M first sequences in the second buffer;
    the ordering controller moving M of the first sequences from the second buffer to M of the first buffers, comprising:
    the first read-write controller moves the M first sequences from the second buffer to the M first buffers.
  13. The method according to claim 12, wherein the method further comprises:
    when M first sequences are moved, the first read-write controller sends the moved data amount information to the second read-write controller.
  14. The method of any of claims 11-13, wherein the ordering controller further comprises a third read-write controller coupled to the sequencer and the second buffer, respectively;
    the method further comprises the steps of:
    and the third read-write controller moves the ordered sequence output by the sequencer to the second buffer.
  15. An electronic device comprising the processor of any of claims 1-7.
CN202180088003.3A 2021-03-18 2021-03-18 Processor, sorting method and electronic equipment Pending CN116670639A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/081638 WO2022193259A1 (en) 2021-03-18 2021-03-18 Processor, sorting method, and electronic device

Publications (1)

Publication Number Publication Date
CN116670639A true CN116670639A (en) 2023-08-29

Family

ID=83321355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180088003.3A Pending CN116670639A (en) 2021-03-18 2021-03-18 Processor, sorting method and electronic equipment

Country Status (2)

Country Link
CN (1) CN116670639A (en)
WO (1) WO2022193259A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013147880A1 (en) * 2012-03-30 2013-10-03 Intel Corporation Method and apparatus of instruction that merges and sorts smaller sorted vectors into larger sorted vector
US9740659B2 (en) * 2014-03-19 2017-08-22 International Business Machines Corporation Merging and sorting arrays on an SIMD processor
CN106250097A (en) * 2016-06-22 2016-12-21 中国科学院计算技术研究所 A kind of acceleration collator towards big data, method, chip, processor

Also Published As

Publication number Publication date
WO2022193259A1 (en) 2022-09-22

Similar Documents

Publication Publication Date Title
CN107657581B (en) Convolutional neural network CNN hardware accelerator and acceleration method
CN107301455B (en) Hybrid cube storage system for convolutional neural network and accelerated computing method
US5590353A (en) Vector processor adopting a memory skewing scheme for preventing degradation of access performance
JP7074831B2 (en) Network-on-chip data processing methods and equipment
US7487318B2 (en) Managing write-to-read turnarounds in an early read after write memory system
CN100557594C (en) The state engine of data processor
US20200293866A1 (en) Methods for improving ai engine mac utilization
KR20060116729A (en) Data transmitting apparatus, data transmitting method and program
JPH0219945A (en) Main memory controller
CN111047036B (en) Neural network processor, chip and electronic equipment
US7617338B2 (en) Memory with combined line and word access
KR102588408B1 (en) Adaptive memory transaction scheduling
CN116661703B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN116670639A (en) Processor, sorting method and electronic equipment
US7028116B2 (en) Enhancement of transaction order queue
US11941440B2 (en) System and method for queuing commands in a deep learning processor
KR20230059536A (en) Method and apparatus for process scheduling
JP4192171B2 (en) Memory access method and memory access device
CN100573489C (en) DMAC issue mechanism via streaming ID method
CN115827211A (en) Near memory computing accelerator, dual inline memory module and computing device
WO2004025478A1 (en) Data processor and processing method utilizing latency difference between memory blocks
WO2023115529A1 (en) Data processing method in chip, and chip
KR20200139256A (en) Network-on-chip data processing method and device
US11544270B2 (en) Hardware accelerator performing search using inverted index structure and search system including the hardware accelerator
US11797280B1 (en) Balanced partitioning of neural network based on execution latencies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination