CN111258632B - Data selection device, data processing method, chip and electronic equipment - Google Patents

Data selection device, data processing method, chip and electronic equipment Download PDF

Info

Publication number
CN111258632B
CN111258632B CN201811450573.9A CN201811450573A CN111258632B CN 111258632 B CN111258632 B CN 111258632B CN 201811450573 A CN201811450573 A CN 201811450573A CN 111258632 B CN111258632 B CN 111258632B
Authority
CN
China
Prior art keywords
data
comparison
multiplexing
circuit
comparison operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811450573.9A
Other languages
Chinese (zh)
Other versions
CN111258632A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201811450573.9A priority Critical patent/CN111258632B/en
Priority to PCT/CN2019/120994 priority patent/WO2020108486A1/en
Publication of CN111258632A publication Critical patent/CN111258632A/en
Application granted granted Critical
Publication of CN111258632B publication Critical patent/CN111258632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Logic Circuits (AREA)

Abstract

The application provides a data selection device, a data processing method, a chip and an electronic device, wherein the device comprises: the data selection device can carry out multi-layer cyclic comparison operation on data, and effectively reduces the delay in the data selection device; in addition, the device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.

Description

Data selection device, data processing method, chip and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data selection device, a data processing method, a chip, and an electronic device.
Background
With the continuous development of digital electronic technology, the rapid development of various Artificial Intelligence (AI) chips has increasingly high requirements for high-performance digital comparators. The neural network algorithm is one of algorithms widely applied to an intelligent chip, and an operation of selecting an extremum from a plurality of values is a common operation in the neural network algorithm.
In general, a maximum value and a minimum value of a plurality of data are selected, and a plurality of operations are required for the plurality of data to determine an extreme value, thereby increasing a delay inside the vector data selection apparatus.
Meanwhile, in the conventional technology, different data selection devices are required to perform comparison operation for data with different bit widths, so that the data selection devices occupy a large area of the AI chip.
Disclosure of Invention
In view of the above, it is desirable to provide a data selection device, a data processing method, a chip, and an electronic apparatus.
An embodiment of the present invention provides a data selection apparatus, where the data selection apparatus includes: the system comprises a data reading circuit, a multiplexing comparison tree circuit, an extreme value register circuit and an ending judgment circuit; the output end of the data read-in circuit is connected with the first input end of the multiplexing comparison tree circuit, the first output end of the multiplexing comparison tree circuit is connected with the first input end of the extreme value register circuit, the first output end of the extreme value register circuit is connected with the input end of the ending judgment circuit, the output end of the ending judgment circuit is connected with the second input end of the extreme value register circuit, and the second output end of the extreme value register circuit is connected with the second input end of the multiplexing comparison tree circuit;
the data reading circuit is used for receiving the number N of data and the initial addresses of a plurality of storage intervals in the register and reading the data according to the initial addresses of the plurality of storage intervals in the register and the number N of the data, the multiplexing comparison tree circuit is used for comparing the sizes of the plurality of received data in a multi-layer cycle, the extreme value register circuit is used for storing an extreme value obtained by comparing each layer of cycles, and the ending judging circuit is used for judging whether the multi-layer cycle comparison processing is ended or not.
In one embodiment, the multiplexing comparison tree circuit comprises a function selection mode signal input end for receiving an input function selection mode signal; the function selection mode signal is used to determine the bit width of the data processed by the data selector.
In one embodiment, the data reading circuit includes: the output end of the data reading unit is connected with the input end of the scalar register array;
the data reading unit is used for receiving the number N of the data and the first addresses of a plurality of storage intervals in the register and reading in the data according to the first addresses of the plurality of storage intervals in the register and the number N of the data, and the scalar register array is used for storing the data read in by the data reading unit according to the addresses of the plurality of storage intervals in the register.
In one embodiment, the data reading unit in the data reading-in circuit includes: the data input port is used for reading in the data according to the initial addresses of a plurality of storage intervals in the register and the number N of the data, the data number and initial address input port is used for receiving the number N of the read-in data and the initial addresses of the plurality of storage intervals in the register, and the data output port is used for outputting the read-in data;
the scalar register array in the data read-in circuit includes: the data input port is used for receiving N data, the first data output port is used for outputting the data stored in each register storage interval, the second data output port is used for outputting the data stored in each register storage interval, and the residual data output port is used for outputting the data stored in each register storage interval during comparison operation.
In one embodiment, the multiplexing compare tree circuit comprises: the device comprises a first-stage multiplexing comparator and a second-stage multiplexing comparator, wherein the first-stage multiplexing comparator is used for comparing two data to obtain an extreme value, and the second-stage multiplexing comparator is used for comparing the two data to obtain the extreme value.
In one embodiment, the first stage of multiplexing comparators in the multiplexing comparison tree circuit comprises: the multiplexing comparator is used for performing cyclic comparison operation on the data stored in the storage interval of the register to obtain a maximum value vector and a minimum value vector; the second stage of multiplexing comparators in the multiplexing comparison tree circuit comprises: the device comprises a first multiplexing comparator and a second multiplexing comparator, wherein the first multiplexing comparator is used for comparing two data to obtain a maximum value, and the second multiplexing comparator is used for comparing two data to obtain a minimum value.
In one embodiment, the multiplexing comparator, the first multiplexing comparator or the second multiplexing comparator includes: the data processing device comprises a function selection mode signal input port, a first data input port, a second data input port, a maximum output port and a minimum output port, wherein the function selection mode signal input port is used for receiving function selection mode signals corresponding to data with different bit widths to be processed, the first data input port is used for receiving the input first data, the second data input port is used for receiving the input second data, the maximum output port is used for outputting a maximum value after each data comparison operation, and the minimum output port is used for outputting a minimum value after each data comparison operation.
In one embodiment, the extremum register circuit includes: the system comprises a maximum register file and a minimum register file, wherein the maximum register file is used for storing a maximum value obtained by multi-layer circulation comparison operation, and the minimum register file is used for storing a minimum value obtained by multi-layer circulation comparison operation.
In one embodiment, the maximum register file in the extremum register circuitry comprises: a first maximum output port, a second maximum output port, a maximum input port, a third maximum output port, a comparison level output port, a judgment result input port, a residual data input port and a maximum input port, wherein the first maximum output port is used for outputting a first maximum, the second maximum output port is used for outputting a second maximum, the maximum input port is used for receiving a maximum obtained by next comparison operation, the third maximum output port is used for outputting a maximum of a plurality of data, the comparison level output port is used for outputting the number of layers of currently circulated comparison operation of the multiplexing comparison tree circuit, the judgment result input port is used for receiving a logic judgment signal, and the residual data input port is used for receiving residual data stored in a register storage interval during comparison operation, the maximum input port is used for receiving a maximum value obtained after each data comparison operation;
the minimum register file in the extremum register circuit comprises: a first minimum value output port, a second minimum value output port, a minimum value input port, a third minimum value output port, a comparison level output port, a judgment result input port, a remaining data input port, and a minimum value input port, wherein the first minimum value output port is used for outputting a first minimum value, the second minimum value output port is used for outputting a second minimum value, the minimum value input port is used for receiving a minimum value obtained by next layer of comparison operation, the third minimum value output port is used for outputting a minimum value in a plurality of data, the comparison level output port is used for outputting the number of layers currently subjected to comparison operation by a second multiplexing comparator, the judgment result input port is used for receiving a logic judgment signal output by a judgment ending circuit, and the remaining data input port is used for receiving the remaining data stored in a register storage interval during comparison operation, the minimum value input port is used for receiving a minimum value obtained after each data comparison operation.
In one embodiment, the end judgment circuit includes: and the judging unit is used for judging the number of layers of the current extreme value comparison result and the total number of layers of the multiplexing comparison tree circuit which needs to carry out cyclic comparison operation to obtain the final extreme value.
In one embodiment, the determining unit includes: the comparison level input port is used for receiving the number of layers corresponding to the currently obtained cyclic comparison result of the extremum register circuit, and the judgment result output port is used for outputting the comparison result of the number of layers corresponding to the current extremum comparison result and the total number of layers of the multiplexing comparison tree circuit which need to be subjected to cyclic comparison operation.
According to the data selection device provided by the embodiment, a plurality of data are read in through the data reading circuit, the multiplexing comparison tree circuit can perform multi-layer cyclic comparison on the plurality of data to obtain a final extreme value, and when the judgment result of the ending judgment circuit is yes, the result of the comparison operation is output through the extreme value register circuit, so that multi-layer cyclic comparison processing can be performed on the plurality of data to obtain a maximum value and a minimum value, and the delay inside the data selection device is effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
The embodiment of the invention provides a data processing method, which comprises the following steps:
receiving data to be processed;
performing multi-layer cyclic comparison operation on the data to be processed through a multiplexing comparison tree circuit;
judging whether the condition for finishing the multilayer circulation comparison operation is met or not through a finishing judgment circuit;
and if the condition for finishing the multilayer cyclic comparison operation is met, outputting a vector extreme value.
In one embodiment, after receiving the data to be processed, the method further includes:
receiving the number N of the data to be processed and the first addresses of a plurality of storage intervals in a register through a data reading unit;
reading the data according to the first addresses of a plurality of storage intervals in the register and the number N of the data, and storing the data into a scalar register array.
In one embodiment, the multiplexing comparison tree circuit performs a multi-level cyclic comparison operation on the data to be processed, including:
performing first-layer cyclic comparison operation on the data to be processed through a first-stage multiplexing comparator to obtain a first-layer extreme value comparison result;
and carrying out multi-layer cyclic comparison operation on the first-layer extreme value comparison result through a second-stage multiplexing comparator.
In one embodiment, the determining, by the end determining circuit, whether a condition for ending the multi-layer loop comparison operation is satisfied includes:
acquiring the number of layers corresponding to the extreme value comparison result obtained by the current comparison operation of the second-stage multiplexing comparator through the ending judgment circuit;
and judging whether the multilayer cyclic comparison operation meets the condition of finishing the multilayer cyclic comparison operation or not according to the number of layers of the current extreme value comparison result.
In one embodiment, after the determining, by the end determining circuit, whether the condition for ending the multi-layer loop comparison operation is satisfied, the method further includes: if not, the extreme value comparison result obtained by the last layer of cyclic comparison operation is continuously compared and operated by the second-level multiplexing comparator until the extreme value comparison result of the last layer of cyclic comparison operation is data, and the extreme value of the vector is output after operation is finished.
In one embodiment, outputting a vector extremum if a condition for ending the multi-level cyclic comparison operation is satisfied includes: and receiving the logic judgment signal input by the judgment unit through an extremum register circuit, and outputting an operation result according to the logic judgment signal.
In the data processing method provided by this embodiment, to-be-processed data is received, the to-be-processed data is input to a multiplexing comparison tree circuit, the to-be-processed data is subjected to cyclic comparison through the multiplexing comparison tree circuit, a termination judgment circuit judges whether a condition for terminating multilayer cyclic comparison operation is satisfied, if the condition for terminating the comparison operation is satisfied, a vector extremum is output, an extremum in a plurality of to-be-processed data can be obtained through multilayer cyclic comparison operation in the process, and delay inside a data selection device is effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
The embodiment of the invention provides a machine learning arithmetic device, which comprises one or more data selection devices of the first aspect; the machine learning arithmetic device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic and transmitting an execution result to other processing devices through an I/O interface;
when the machine learning arithmetic device comprises a plurality of data selection devices, the data selection devices can be linked through a specific structure and transmit data;
the data selection devices are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; the data selection devices share the same control system or own respective control systems; the data selection devices share a memory or own respective memories; the interconnection mode of the plurality of data selection devices is any interconnection topology.
The combined processing device provided by the embodiment of the invention comprises the machine learning processing device, the universal interconnection interface and other processing devices; the machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user; the combined processing device may further include a storage device, which is connected to the machine learning arithmetic device and the other processing device, respectively, and is configured to store data of the machine learning arithmetic device and the other processing device.
The neural network chip provided by the embodiment of the invention comprises the data selection device, the machine learning operation device or the combined processing device.
The neural network chip packaging structure provided by the embodiment of the invention comprises the neural network chip.
The board card provided by the embodiment of the invention comprises the neural network chip packaging structure.
The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.
An embodiment of the present invention provides a chip, including at least one data selection device as described in any one of the above.
The electronic equipment provided by the embodiment of the invention comprises the chip.
Drawings
FIG. 1 is a schematic diagram of a data selection device;
FIG. 2 is a schematic diagram of another data selection apparatus;
fig. 3 is a schematic structural diagram of a data selection apparatus according to an embodiment;
fig. 4 is a schematic structural diagram of another data selection apparatus according to an embodiment;
FIG. 5 is a schematic diagram of a specific structure of a multiplexing comparator;
FIG. 6 is a flowchart illustrating a data processing method according to an embodiment;
FIG. 7 is a schematic diagram illustrating a detailed process of reading data by a data reading circuit according to another embodiment;
FIG. 8 is a schematic diagram illustrating a flow chart of a multi-level cyclic comparison operation performed on data by the multiplexing compare tree circuit according to another embodiment;
FIG. 9 is a flowchart illustrating a method for determining whether a condition for ending the loop comparison operation is satisfied according to another embodiment;
FIG. 10 is a flow chart illustrating another data processing method according to an embodiment;
FIG. 11 is a flowchart illustrating a method for gating the compare data input mux-tree according to one embodiment;
FIG. 12 is a schematic diagram illustrating a flow chart of a multi-level loop comparison operation performed on the comparison data after being strobed by the multiplexing comparison tree circuit according to another embodiment;
FIG. 13 is a block diagram of a combined processing device according to an embodiment;
FIG. 14 is a block diagram of another alternative combination processing device according to an embodiment;
fig. 15 is a schematic structural diagram of a board card according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The data selection device provided by the application can be applied to an AI chip, a Field-Programmable Gate Array (FPGA) chip, or other hardware circuit devices for vector comparison operation processing, and the specific structural schematic diagrams thereof are shown in fig. 1 and 2.
Fig. 1 is a block diagram of a data selection apparatus according to an embodiment. As shown in fig. 1, the data selection apparatus includes: a data read-in circuit 11, a multiplexing comparison tree circuit 12, an extremum register circuit 13 and an end judgment circuit 14; wherein, the output end of the data read-in circuit 11 is connected to the first input end of the multiplexing comparison tree circuit 12, the first output end of the multiplexing comparison tree circuit 12 is connected to the first input end of the extremum register circuit 13, the first output end of the extremum register circuit 13 is connected to the input end of the ending judgment circuit 14, the output end of the ending judgment circuit is connected to the second input end of the extremum register circuit, and the second output end of the extremum register circuit is connected to the second input end of the multiplexing comparison tree circuit 12; the data reading circuit 11 is configured to receive the number N of data and the first addresses of multiple storage intervals in the register, and read in the data according to the first addresses of the multiple storage intervals in the register and the number N of the data, the multiplexing comparison tree circuit 12 is configured to compare the sizes of the received multiple data in a multi-layer cycle, the extremum register circuit 13 is configured to store an extremum obtained by comparing each layer of cycles, and the end determining circuit 14 is configured to determine whether the multi-layer cycle comparison processing is ended.
Specifically, the data reading circuit 11 may include a plurality of data reading units having different functions, the multiplexing comparison tree circuit 12 may include a plurality of multiplexing comparators, and the extremum register circuit 13 may include a maximum value processing unit and a minimum value processing unit. Optionally, there may be one or more input ports of the data reading units with different functions, the function of each input port of each data reading unit may be different, there may also be one or more output ports, the function of each output port of each data reading unit may be different, and the circuit structures of the data reading units with different functions may be different. Optionally, the circuit structures of the multiple multiplexing comparators may be the same, and the functions of the input port and the output port of each multiplexing comparator may be the same.
It should be noted that there may be a plurality of input ports of the maximum processing unit and the minimum processing unit, and the function of each input port may be different, and there may also be a plurality of output ports of the maximum processing unit and the minimum processing unit, and the function of each output port may be different.
Optionally, the multiplexing comparison tree circuit 12 may include a function selection Mode signal input terminal Mode for receiving an input function selection Mode signal. Optionally, the function selection mode signal is used to determine a bit width of data processed by the data selector.
Optionally, the function selection mode signal may be multiple, and the multiplexing comparator corresponding to different function selection mode signals may process data with different bit widths.
According to the data selection device provided by the embodiment, a plurality of data are read in through the data reading circuit, the multiplexing comparison tree circuit can perform multi-layer cyclic comparison on the plurality of data to obtain a final extreme value, and when the judgment result of the ending judgment circuit is yes, the comparison operation result is output through the extreme value register circuit, so that multi-layer cyclic comparison processing can be performed on the plurality of data to obtain a maximum value and a minimum value, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
Fig. 2 is a block diagram of another data selection apparatus according to an embodiment. As shown in fig. 2, the data selecting apparatus includes: a data reading circuit 11, a selection circuit 12, a multiplexing comparison tree circuit 13, an extremum register circuit 14 and an end judgment circuit 15; the output end of the data reading circuit 11 is connected to the first input end of the selection circuit 12, the output end of the selection circuit 12 is connected to the first input end of the multiplexing comparison tree circuit 13, the output end of the multiplexing comparison tree circuit 13 is connected to the first input end of the extremum register circuit 14, the first output end of the extremum register circuit 14 is connected to the second input end of the selection circuit 12, the second output end of the extremum register circuit 14 is connected to the input end of the end determining circuit 15, the output end of the end determining circuit 15 is connected to the second input end of the extremum register circuit 14, and the third output end of the extremum register circuit 14 is connected to the second input end of the multiplexing comparison tree circuit 13; the data reading circuit 11 is configured to receive the number N of data and the head addresses of a plurality of storage intervals in a register, and read in the data according to the head addresses of the plurality of storage intervals in the register and the number N of the data, the selection circuit 12 is configured to gate two data received by the cyclic comparison and operation multiplexing comparison tree circuit 13, the multiplexing comparison tree circuit 13 is configured to compare the sizes of the plurality of received data in a multi-layer cycle, the extremum register circuit 14 is configured to store an extremum obtained by comparing each layer of cycles, and the end judgment circuit 15 is configured to judge whether the multi-layer cycle comparison processing is ended.
Specifically, the data reading circuit 11 may include a plurality of data reading units having different functions, and the selection circuit 12 may gate the two data received by the multiplexing comparison tree circuit 13 for each comparison operation to be input through the extremum register circuit 14 or to be input through the data reading circuit 11. Alternatively, the multiplexing comparison tree circuit 13 may include a plurality of multiplexing comparators, and the extremum register circuit 14 may include a maximum value processing unit and a minimum value processing unit. Optionally, there may be one or more input ports of the data reading units with different functions, the function of each input port of each data reading unit may be different, there may be one or more output ports, the function of each output port of each data reading unit may be different, and the circuit structures of the data reading units with different functions may be different. Optionally, the circuit structures of the multiple multiplexing comparators may be the same, and the functions of the input port and the output port of each multiplexing comparator may be the same.
It should be noted that there may be a plurality of input ports of the maximum processing unit and the minimum processing unit, and the function of each input port may be different, and there may also be a plurality of output ports of the maximum processing unit and the minimum processing unit, and the function of each output port may be different.
Optionally, the multiplexing comparison tree circuit 13 may include a function selection Mode signal input terminal Mode for receiving an input function selection Mode signal.
Optionally, the function selection mode signal may be multiple, and the multiplexing comparator corresponding to different function selection mode signals may process data with different bit widths.
According to the data selection device provided by the embodiment, a plurality of data are read in through the data reading circuit, the multiplexing comparison tree circuit can carry out multi-layer cyclic comparison on the plurality of read-in data to obtain a final extreme value, and when the judgment result of the judgment ending circuit is yes, the result of comparison operation is output through the extreme value register circuit, so that multi-layer cyclic comparison processing can be carried out on the plurality of data to obtain a maximum value and a minimum value, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
Fig. 3 is a schematic structural diagram of a data selection apparatus according to another embodiment, where the data selection apparatus includes the data reading circuit 11, and the data reading circuit 11 includes: the output end of the data reading unit 111 is connected with the input end of the scalar register array 112; the data reading unit 111 is configured to receive the number N of the data and the first addresses of the multiple storage sections in the register, and read in the data according to the first addresses of the multiple storage sections in the register and the number N of the data, and the scalar register array 112 is configured to store the data read in by the data reading unit 111 according to the addresses of the multiple storage sections in the register.
Specifically, the scalar register array 112 may include a plurality of storage sections, and the number of the storage sections may be equal to the number N of data received by the data reading unit 111. Optionally, each storage interval may store one piece of data, and each storage interval may store any received piece of data. Alternatively, the processing of the next circuit may be performed when all of the N data are stored in the scalar register array 112. The data reading unit 111 may sequentially read N data according to the first addresses a of the plurality of storage sections in the register.
In the data selection device provided by this embodiment, the data reading unit may receive the number of the data, and read in the data according to the first addresses of the plurality of storage sections in the register and the number of the data, the data reading unit receives the data input by the scalar register array, the scalar register array sequentially stores the received data into the storage sections according to the first addresses of the storage sections, the data stored in the storage sections are sequentially input to the multiplexing comparison tree circuit, and the multiplexing comparison tree circuit performs multi-layer cyclic comparison to obtain a final extremum, so that the multi-layer cyclic comparison processing may be performed on the plurality of data to obtain a maximum value and a minimum value therein, thereby effectively reducing the amount of operations and the delay inside the data selection device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selecting apparatus shown in fig. 3, the data selecting apparatus includes the data reading unit 111, and the data reading unit 111 includes: the data input port 1111 is used for reading in the data according to the head addresses of a plurality of storage intervals in the register and the number N of the data, the data number input port 1112 is used for receiving the number N of the read-in data and the head addresses of a plurality of storage intervals in the register, and the data output port 1113 is used for outputting the read-in data.
Note that if all the memory sections in the register have numbers, for example, 0, 1, 2, 3, … …, the first address a of the memory section may be 0. Optionally, N may be any positive integer, and a specific value of N may be equal to the number of data received by the data reading unit 111. Alternatively, the data number input port 1112 may receive the number of data read in by the data reading unit 111. Alternatively, the data output port 1113 may output one data at a time according to the first address of the storage section, and the number of times of outputting the data may be equal to the number N of the data.
According to the data selection device provided by the embodiment, the number of data can be received through the data reading unit, the data is read in according to the first addresses of a plurality of storage intervals in the register and the number of the data, the received data is sequentially stored in the storage intervals according to the first addresses of the storage intervals through the scalar register array, the data stored in the storage intervals are sequentially input into the multiplexing comparison tree circuit, and the multiplexing comparison tree circuit is used for carrying out multi-layer cyclic comparison to obtain a final extreme value, so that multi-layer cyclic comparison processing can be carried out on the plurality of data to obtain a maximum value and a minimum value, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 3, the data selection apparatus includes the scalar register array 112, and the scalar register array 112 includes: the data input port 1121 is used for receiving N data, the first data output port 1122 is used for outputting the data stored in each register storage interval during each comparison operation, the second data output port 1123 is used for outputting the data stored in each register storage interval, and the remaining data output port 1124 is used for outputting the remaining data stored in the register storage interval during the comparison operation.
Specifically, the data input port 1121 may receive N data, where each time one data can be received, each time N data can also be received, but only one data stored in one register storage interval can be output each time, and the number of times of outputting data may be equal to N. Optionally, one register memory interval may store one data. Alternatively, the number of register storage sections in the scalar register array 112 may be equal to the number N of data received by the data reading unit 111.
It should be noted that, when the multiplexing-comparison tree circuit 13 performs the circular comparison operation, if the first data output port 1122 and the second data output port 1123 are floating, the remaining data output port 1124 may input data to the maximum value register circuit 13. Each time the loop compare operation is performed, if the remaining data output port 1124 is in a floating state, the first data output port 1122 and the second data output port 1123 may respectively input one data to the extremum register circuit 13, and the two input data are stored data in different register storage sections in the scalar register array 112, and the storage addresses corresponding to the two different register storage sections may be adjacent or non-adjacent. Optionally, the remaining data output port 1124 may output the remaining registers stored in the register storage section of the scalar register array 112 when the circular compare operation is performed by the mux-compare tree circuit 12. In addition, when the amount of data stored in the scalar register array 112 is odd, the remaining data output port 1124 may not be in a floating state until the last comparison operation in the cyclic comparison operation, and at this time, one remaining data in the scalar register array 112 is output.
According to the data selection device provided by the embodiment, the scalar register array can be used for sequentially storing the received N data, the data stored in the storage section are sequentially input into the multiplexing comparison tree circuit, and the multiplexing comparison tree circuit is used for performing multi-layer cyclic comparison to obtain a final extreme value, so that the N data can be subjected to multi-layer cyclic comparison processing to obtain a maximum value and a minimum value, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 3, the data selection apparatus includes the first-stage multiplexing comparator 121, and the first-stage multiplexing comparator 121 includes: a multiplexing comparator 1211, wherein the multiplexing comparator 1211 is configured to perform a cyclic comparison operation on the data stored in the scalar register array 112 to obtain a maximum vector and a minimum vector.
Specifically, the multiplexing comparator 1211 may perform a first layer of cyclic comparison operation, and each time the two data are compared, a maximum value and a minimum value of the two data are obtained. It should be noted that if the number of data received by the data reading circuit 11 is N, and N is an even number, the number of times that the multiplexing comparator 1211 can perform the first-layer cyclic comparison operation is equal to N/2, and if N is an odd number, the number of times that the multiplexing comparator 1211 can perform the cyclic comparison operation is equal to round (N/2), and round (x) may represent rounding a real number.
In the data selection device provided by this embodiment, the multiplexing comparator can perform cyclic comparison on multiple data to obtain a final extreme value, so that multiple layers of cyclic comparison processing can be performed on multiple data to obtain a maximum value and a minimum value therein, thereby effectively reducing the amount of computation and the delay inside the data selection device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 3, the data selection apparatus includes the second-stage multiplexing comparator 122, and the second-stage multiplexing comparator 122 includes: a first multiplexing comparator 1221 and a second multiplexing comparator 1222, wherein the first multiplexing comparator 1221 is used for comparing two data to obtain a maximum value, and the second multiplexing comparator 1222 is used for comparing two data to obtain a minimum value.
It should be noted that, the first multiplexing comparator 1221 and the second multiplexing comparator 1222 can perform multiple layers of cyclic comparison operations, the comparison result of each layer of cyclic comparison operations can be stored in the extremum register circuit 13, and the extremum register circuit 13 has a corresponding number for each layer of cyclic comparison results. Alternatively, the number of layers of the circular comparison operation performed by the first multiplexing comparator 1221 may be equal to the number of layers of the circular comparison operation performed by the second multiplexing comparator 1222, and the total number of times of the comparison operation of each layer may be equal.
According to the data selection device provided by the embodiment, the multiplexing comparison tree circuit can perform multi-layer cyclic comparison on a plurality of data to obtain a final extreme value, so that multi-layer cyclic comparison processing can be performed on the plurality of data to obtain a maximum value and a minimum value, and the computation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
Another embodiment provides a multiplexing compare tree circuit, the multiplexing compare tree circuit 12 comprising: the first-stage multiplexing comparator 121 is configured to compare two data to obtain an extreme value, and the second-stage multiplexing comparator 122 is configured to compare two data to obtain an extreme value.
Specifically, each of the first-stage multiplexing comparator 121 and the second-stage multiplexing comparator 122 may perform a multi-stage circular comparison operation on a plurality of data. Optionally, the first-stage multiplexing comparator 121 may perform a first-stage cyclic comparison operation on all data stored in the scalar register array 112, and a result obtained by each comparison operation may be input into the extremum register circuit 13 to be stored, where the extremum register circuit 13 may number a corresponding number of a cyclic comparison result of each stage. Illustratively, the number of the first layer loop comparison result is corresponding to 1, the number of the second layer loop comparison result is corresponding to 2, and the first layer loop comparison result and the second layer loop comparison result are sequentially numbered until the number of the M layer loop comparison result is corresponding to M. In addition, each time of the comparison operation, the first-stage multiplexing comparator 121 may receive the data stored in the two different storage sections input by the scalar register array 112, compare the data to obtain a maximum value and a minimum value of the two data, store the maximum value and the minimum value in the extremum register circuit 13, empty the storage sections of the two data stored by the scalar register array 112, and during the next comparison operation, the scalar register array 112 may input the data stored in the other two different storage sections to the first-stage multiplexing comparator 121, continue the comparison operation until all the data stored in the scalar register array 112 are compared, and the first-stage multiplexing comparator 121 ends the loop operation.
It should be noted that, after each comparison operation, the first stage multiplexing comparator 121 may obtain an extremum to store in the extremum register circuit 13, and from the first comparison operation, if the first stage of the multiplexing comparator 121 performs two consecutive comparison operations, two extrema values can be input to the extremum register circuit 13, and both extrema values are the first layer of the cyclic comparison result, the corresponding numbers are both 1, and at this time, the second-stage multiplexing comparator 122 can read the two first-stage loop comparison results stored in the extremum register circuit 13 to perform the second-stage loop comparison operation, that is, as long as the first-stage loop comparison result stored in the extremum register circuit 13 is two data, the second-stage multiplexing comparator 122 can automatically read the two first-stage loop comparison results stored in the extremum register circuit 13 to perform the second-stage loop comparison operation. Optionally, the first-layer cyclic comparison result processed by the second-level multiplexing comparator 122 may be referred to as a second-layer cyclic comparison operation, the result obtained by the second-layer cyclic comparison operation may be referred to as a second-layer cyclic comparison result, the second-layer cyclic comparison result may still be stored in the extreme value register circuit 13, and multiple-layer cyclic comparison operations are sequentially performed, the next-layer cyclic comparison operation may process the previous-layer cyclic comparison result, and from the start of the second-layer cyclic comparison operation, after the end of each-layer cyclic comparison operation is required, all the results of the layer of operation are stored in the extreme value register circuit 13, and the next-layer cyclic comparison operation may be performed by the second-level multiplexing comparator 122 until the result of a certain layer of cyclic comparison operation is one data, and the multiple-layer cyclic comparison operation is ended.
According to the data selection device provided by the embodiment, the multiplexing comparison tree circuit can perform multi-layer cyclic comparison on a plurality of data to obtain a final extreme value, so that multi-layer cyclic comparison processing can be performed on the plurality of data to obtain a maximum value and a minimum value, and the computation amount and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 3, the data selection apparatus includes the multiplexing comparator 1211, the first multiplexing comparator 1221 and the second multiplexing comparator 1222, and the multiplexing comparator 1211, the first multiplexing comparator 1221 or the second multiplexing comparator 1222 includes: the data processing device comprises a function selection Mode signal input port (Mode)1221a, a first data input port 1221b, a second data input port 1221c, a maximum output port 1221d and a minimum output port 1221e, wherein the function selection Mode signal input port (Mode)1221a is configured to receive a function selection Mode signal corresponding to data with different bit widths to be processed, the first data input port 1221b is configured to receive the input first data, the second data input port 1221c is configured to receive the input second data, the maximum output port 1221d is configured to output a maximum value after each data comparison operation, and the minimum output port 1221e is configured to output a minimum value after each data comparison operation.
Specifically, the function selection Mode signal input port (Mode)1221a may receive different function selection Mode signals. Alternatively, there may be a plurality of different function selection mode signals, the different function selection mode signals correspond to the multiplexing selector 1211, and the first multiplexing comparator 1221 and the second multiplexing comparator 1222 may process data having different bit widths. Optionally, the first data input port 1221b and the second data input port 1221c may receive two different data stored in the scalar register array 112 and may also receive two different data stored in the extremum register circuit 13 for each comparison operation.
Alternatively, the circuit configurations of the multiplexing selector 1211, the first multiplexing comparator 1221 and the second multiplexing comparator 1222 may be equal, and the circuit configuration diagram is shown in fig. 5.
According to the data selection device provided by the embodiment, the multiplexing comparison tree circuit can perform multi-layer cyclic comparison on a plurality of gated data to obtain a final extreme value, so that the multi-layer cyclic comparison processing can be performed on the plurality of data to obtain a maximum value and a minimum value, and the computation amount and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the detailed structural schematic diagram of the data selection apparatus shown in fig. 3, the data selection apparatus includes the extremum register circuit 13, where the extremum register circuit 13 includes: the system comprises a maximum value register file 131 and a minimum value register file 132, wherein the maximum value register file 131 is used for storing maximum values obtained by multi-layer circulation comparison operation, and the minimum value register file 132 is used for storing minimum values obtained by multi-layer circulation comparison operation.
It should be noted that the maximum register file 131 may store the maximum value obtained by each comparison operation of the first-stage multiplexing comparator 121, and may also store the maximum value obtained by each layer of the cyclic comparison operation of the second-stage multiplexing comparator 122. Optionally, the number of storage intervals in the maximum register file 131 may be set according to user needs, and the result of the multi-layer cyclic comparison operation may be stored in the maximum register file 131. For example, if the number of data received by the data reading circuit 11 is N, the total number of layers of the circular comparison operation may be equal to log 2N. Alternatively, the number of maximum comparison results obtained by the circular comparison of each layer of the multiplexing comparison tree circuit 12 may be equal to the total number of layers of the circular comparison operation performed by the second multiplexing comparator 1222 plus one. Illustratively, if the second multiplexing comparator 1222 performs a first layer of round robin comparison operation, the round robin comparison result obtained by the round robin comparison operation is the round robin comparison result of the second layer of the multiplexing comparison tree circuit 12.
Optionally, the minimum register file 131 may store the minimum value obtained by each comparison operation of the first-stage multiplexing comparator 121, and may also store the minimum value obtained by each layer of cyclic comparison operation of the second-stage multiplexing comparator 122. Optionally, the number of storage intervals in the minimum register file 132 may be set according to user requirements, and the result of the multi-layer loop comparison operation may be stored in the minimum register file 132. For example, if the number of data received by the data reading circuit 11 is N, the total number of layers of the circular comparison operation may be equal to log 2N. Optionally, the number of layers corresponding to the minimum comparison result obtained by each layer of the cyclic comparison operation of the multiplexing comparison tree circuit 12 may be equal to the total number of layers of the cyclic comparison operation performed by the second multiplexing comparator 1222 plus one. For example, if the second multiplexing comparator 1222 performs the first layer of round-robin comparison operation, the round-robin comparison result obtained by the round-robin comparison operation of the first layer is the round-robin comparison result of the second layer of the multiplexing comparison tree circuit 12.
In the data selection device provided by this embodiment, the data selection device may perform multi-layer cyclic comparison on multiple data through the first-stage multiplexing comparator and the second-stage multiplexing comparator to obtain a final extreme value, so that the multiple data may be subjected to multi-layer cyclic comparison processing to obtain a maximum value and a minimum value therein, thereby effectively reducing the amount of operation and the delay inside the data selection device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural schematic diagram of the data selection device shown in fig. 3, the data selection device includes the maximum register file 131, and the maximum register file 131 includes: a first maximum value output port 1311, a second maximum value output port 1312, a maximum value input port 1313, a third maximum value output port 1314, a comparison level output port 1315, a determination result input port 1316, a remaining data input port 1317 and a maximum value input port 1318, the first maximum value output port 1311 being for outputting a first maximum value, the second maximum value output port 1312 being for outputting a second maximum value, the maximum value input port 1313 being for receiving a maximum value obtained by a next layer comparison operation, the third maximum value output port 1314 being for outputting a maximum value of a plurality of the data, the comparison level output port 1315 being for outputting the number of layers currently subjected to a circular comparison operation by the multiplexing comparison tree circuit 12, the determination result input port 1316 being for receiving a logical determination signal, the remaining data input port 1317 being for receiving the remaining data stored in the storage section at the time of the comparison operation, the maximum input port 1318 is configured to receive a maximum obtained after each data comparison operation.
Specifically, the maximum input port 1313 may receive the maximum result output by the first multiplexing comparator 1221 through the circular comparison operation. It should be noted that, if the total number of times of the first-layer comparison operation by the first multiplexing comparator 1221 is N/2, and the number of data in the result of the first-layer loop comparison obtained may be N/2, the total number of times of the next-layer comparison operation may be N/4, and so on, until the number of data in the comparison result obtained by the last-layer loop comparison operation is equal to 1, the multi-layer loop comparison operation is ended, and at this time, one data is stored in the maximum value register file 131. In addition, when N ^ 2^ N (N may be equal to any real number), the total number of times of comparison operations of each layer may be equal to the number of data in the comparison result obtained by the comparison operation of the current layer, and the total number of times of comparison operations of each layer may be equal to 1/2 of the number of times of comparison operations of the previous layer; when N is odd or even other than 2^ N, the total number of comparison operations in each layer can be not equal to 1/2 of the number of comparison operations in the previous layer. Optionally, the second multiplexing comparator 1222 may compare any two maximum values in the comparison result of the previous layer, and may also compare the results obtained by two adjacent comparison operations by the multiplexing comparator 1211, at this time, the storage sections corresponding to the two maximum values being subjected to the comparison operation stored in the maximum value register file 131 are cleared, and when the comparison result obtained by the layer comparison operation may be stored in the cleared storage sections, it may also be stored in other register storage sections where no data is stored. Optionally, the input port 1316 is configured to receive a logic determination signal output by the end determination circuit 14.
After the first-level circular comparison operation is completed, if one unprocessed data is stored in the scalar register array 112, the maximum register file 131 may receive the remaining data through the remaining data input port 1317, and perform the multi-level circular comparison operation on the floating point number and the first-level circular comparison operation result through the multiplexing comparison tree circuit 12. Optionally, the maximum input port 1318 may receive a maximum value obtained in each comparison operation in the first layer of the cyclic comparison operation.
According to the data selection device provided by the embodiment, the data selection device can perform multi-layer cyclic comparison processing on a plurality of data through the first-stage multiplexing comparator and the second-stage multiplexing comparator, the maximum value obtained by each layer of comparison operation can be stored in the maximum value register file, and data are provided for the next layer of comparison operation, so that multi-layer cyclic comparison processing can be performed on the plurality of data, the maximum value and the minimum value are obtained, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection device shown in fig. 3, the data selection device includes the minimum register file 132, and the minimum register file 132 includes: a first minimum output port 1321, a second minimum output port 1322, a minimum input port 1323, a third minimum output port 1324, a comparison hierarchy output port 1325, a determination result input port 1326, a remaining data input port 1327, a minimum input port 1328, wherein the first minimum output port 1321 is configured to output a first minimum, the second minimum output port 1322 is configured to output a second minimum, the minimum input port 1323 is configured to receive a minimum obtained by a next comparison operation, the third minimum output port 1324 is configured to output a minimum of a plurality of data, the comparison hierarchy output port 1325 is configured to output a number of layers currently subjected to a comparison operation by the second multiplexing comparator 1222, the determination result input port 1326 is configured to receive a logical determination signal output by the end determination circuit 14, and the remaining data input port 1327 is configured to receive remaining data stored in the comparison operation timing register array 112, the minimum input port 1328 is configured to receive a minimum value obtained after each data comparison operation.
Specifically, the minimum input port 1323 may receive a minimum result output by the first multiplexing comparator 1221 through a round-robin comparison operation. It should be noted that, if the total number of times of the first-layer comparison operation of the first multiplexing comparator 1221 is N/2, and the number of data in the result of the first-layer loop comparison obtained may be N/2, the total number of times of the next-layer comparison operation may be N/4, and so on, until the number of data in the comparison result obtained by the last-layer loop comparison operation is equal to 1, the multi-layer loop comparison operation is ended, and at this time, one floating point number is stored in the minimum value register file 132. Optionally, the second multiplexing comparator 1222 may perform a comparison operation on any two minimum values in the comparison result of the previous layer, and may also perform a comparison operation on the result obtained by two adjacent comparison operations of the multiplexing comparator 1211, at this time, the corresponding storage intervals of the two minimum values being subjected to the comparison operation stored in the minimum value register file 132 may be cleared, and when the comparison result obtained by the layer comparison operation may be stored in the cleared two storage intervals, the comparison result may also be stored in other register storage intervals in which no data is stored. Optionally, the judgment result input port 1326 is configured to receive a logic judgment signal output by the end judgment circuit 14.
After the first-level round-robin comparison operation is completed, if one unprocessed data is stored in the scalar register array 112, the minimum register file 132 may receive the remaining data through the remaining data input port 1327, and perform the multi-level round-robin comparison operation on the data and the first-level round-robin comparison operation result through the multiplexing comparison tree circuit 12. Optionally, the minimum input port 1328 may receive a minimum value obtained by each comparison operation in the first layer of the cyclic comparison operation. When there is one remaining data in the scalar register array 112 after the first-stage loop comparison operation is completed, the remaining data is input to both the maximum value register file 131 and the minimum value register file 132.
According to the data selection device provided by the embodiment, the data selection device can perform multilayer cyclic comparison processing on a plurality of data through the first-stage multiplexing comparator and the second-stage multiplexing comparator, the minimum value obtained by each layer of comparison operation can be stored in the minimum value register file to provide data for the next layer of comparison operation, so that the multilayer cyclic comparison processing can be performed on the plurality of data to obtain the maximum value and the minimum value, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection device shown in fig. 3, the data selection device includes the ending judgment circuit 14, and the ending judgment circuit 14 includes: a determining unit 141, where the determining unit 141 is configured to determine the number of layers of the current extremum comparison result and the total number of layers that need to be subjected to the cyclic comparison operation by the multiplexing comparison tree circuit 12 to obtain the final extremum.
It should be noted that, if the number of layers of the current extremum comparison result is equal to the total number of layers of the final extremum multiplexing comparison tree circuit 12 that needs to perform the cyclic comparison operation, the determination result of the determining unit 141 may be that the multi-layer cyclic comparison operation is ended, and the extremum in the multiple data is output, in this case, the multiplexing comparison tree circuit 12 does not need to continue the cyclic comparison operation. Optionally, the extreme value comparison result may be a maximum value comparison result, and may also be a minimum value comparison result.
According to the data selection device provided by the embodiment, the data selection device can perform multi-layer cyclic comparison processing on a plurality of data through the first-stage multiplexing comparator and the second-stage multiplexing comparator, judge whether cyclic comparison operation is finished through the judgment unit, and if the judgment result of the judgment unit is yes, finish the cyclic comparison operation and output the operation result, so that the multi-layer cyclic comparison processing can be performed on the plurality of data, a maximum value and a minimum value are obtained, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural schematic diagram of the data selecting apparatus shown in fig. 3, the data selecting apparatus includes the determining unit 141, where the determining unit 141 includes: a comparison level input port 1411 and a judgment result output port 1412, where the comparison level input port 1411 is configured to receive the number of layers corresponding to the cyclic comparison result currently obtained by the extremum register circuit 13, and the judgment result output port 1412 is configured to output a comparison result between the number of layers corresponding to the current extremum comparison result and the total number of layers that the multiplexing comparison tree circuit 12 needs to perform the cyclic comparison operation.
It should be noted that, if the number of layers corresponding to the current extremum comparison result is equal to the number of layers that the multiplexing comparison tree circuit 12 needs to perform the circular comparison operation, the determining unit 141 may input a high level signal to the maximum register file 131 and the minimum register file 132 through the determination result output port 1412, and instruct the maximum register file 131 and the minimum register file 132 to output the operation results respectively.
According to the data selection device provided by the embodiment, the data selection device can perform multilayer cyclic comparison processing on a plurality of data through the first-stage multiplexing comparator and the second-stage multiplexing comparator, judge whether the multilayer cyclic comparison operation is finished through the judgment unit, and if the judgment result of the judgment unit is yes, finish the multilayer cyclic comparison operation and output the operation result, so that the multilayer cyclic comparison processing can be performed on the plurality of data, the maximum value and the minimum value are obtained, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
Fig. 4 is a schematic diagram illustrating a specific structure of another data selection device according to an embodiment, where the structure of the data selection device shown in fig. 4 is substantially the same as that of the data selection device shown in fig. 3. The differences include the following. The data selection apparatus shown in fig. 4 includes the scalar register array 112, and the scalar register array 112 includes: a data input port 1121, a first data output port 1122, a second data output port 1123, a third data output port 1124, a fourth data output port 1125, a remaining data output port 1126, a first logic signal output port (Sel0)1127 and a second logic signal output port (Sel1)1128, wherein the data input port 1121 is configured to receive N data, the first data output port 1122 is configured to output the data stored in each register bank at each compare operation, the second data output port 1123 is configured to output the data stored in each register bank, the third data output port 1124 is configured to output the data stored in each register bank, the fourth data output port 1125 is configured to output the data stored in each register bank, and the remaining data output port 1126 is configured to output the remaining data stored in the register bank at the compare operation The first logic signal output port (Sel0)1127 is for outputting a first logic signal, and the second logic signal output port (Sel1)1128 is for outputting a second logic signal.
Specifically, the first logic signal and the second logic signal may each include a high-level logic signal and a low-level logic signal. Optionally, the number N of data received by the data reading circuit 11 may be an odd number or an even number. Optionally, the first data output port 1122, the second data output port 1123, the third data output port 1124 and the fourth data output port 1125 may all input data to the mux-compare tree circuit 13 for performing multiple levels of circular comparison operations.
In the first-level cyclic comparison operation, if N is an odd number and the number of data read in the data read circuit 11 is equal to or less than 1, the first logic signal output port (Sel0)1127 may output a high-level logic signal, and if N is an odd number and the number of data read in the data read circuit 11 is equal to or less than 3, the second logic signal output port (Sel1)1128 may output a high-level logic signal. Alternatively, the first-level cyclic comparison operation may be characterized as a process in which the multiplexing comparison tree circuit 13 performs a cyclic comparison operation on all data stored in the data reading circuit 11. Optionally, in the process of the first-layer cyclic comparison operation, if the number of data in the data reading circuit 11 is less than or equal to 1, the data reading circuit 11 may directly input the remaining data to the extreme register circuit 14, and the first-layer cyclic comparison operation is not required.
According to the data selection device provided by the embodiment, the received data is sequentially stored in the storage interval according to the first address of the storage interval through the scalar register array, the data stored in the storage interval is sequentially input into the multiplexing comparison tree circuit, and the received data is subjected to multi-layer cyclic comparison through the multiplexing comparison tree circuit to obtain a final extreme value, so that multi-layer cyclic comparison processing can be performed on a plurality of data to obtain a maximum value and a minimum value, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection device shown in fig. 4, the data selection device includes the selection circuit 12, where the selection circuit 12 includes: a first selection unit 121, a second selection unit 122, a third selection unit 123 and a fourth selection unit 124, wherein the first selection unit 121 is used for gating the first data received by the cyclic comparison operation multiplexing comparison tree circuit 13, the second selection unit 122 is used for gating the second data received by the cyclic comparison operation multiplexing comparison tree circuit 13, the third selection unit 123 is used for gating the third data received by the cyclic comparison operation multiplexing comparison tree circuit 13, and the fourth selection unit 124 is used for gating the fourth data received by the cyclic comparison operation multiplexing comparison tree circuit 13.
Specifically, in the first-level cyclic comparison operation, the first selection unit 121 may gate whether the first data received by the multiplexing comparison tree circuit 13 needs to be input through the extremum register circuit 14 or the data read-in circuit 11. Alternatively, the second selection unit 122 may gate whether the second data received by the multiplexing comparison tree circuit 13 needs to be input through the extremum register circuit 14 or needs to be input through the data reading circuit 11. Alternatively, the third selection unit 123 may gate whether the third data received by the multiplexing comparison tree circuit 13 needs to be input through the extremum register circuit 14 or needs to be input through the data reading circuit 11. Alternatively, the fourth selection unit 124 may gate whether the fourth data received by the multiplexing comparison tree circuit 13 needs to be input through the extremum register circuit 14 or needs to be input through the data reading circuit 11.
According to the data selection device provided by the embodiment, the multiplexing comparison tree circuit can be gated through the selection circuit to receive four different data, and the multiplexing comparison tree circuit is used for carrying out multi-layer cyclic comparison on a plurality of data to obtain a final extreme value, so that multi-layer cyclic comparison processing can be carried out on the plurality of data to obtain a maximum value and a minimum value, and the operand and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural schematic diagram of the data selecting apparatus shown in fig. 4, the data selecting apparatus includes the first selecting unit 121, where the first selecting unit 121 includes: a first logic signal input port 1211, a first data input port 1212, a recalled first maximum input port 1213 and a first data output port 1214, the first logic signal input port 1211 being for receiving a first logic signal, the first data input port 1212 being for receiving input first data, the recalled first maximum input port 1213 being for receiving a maximum comparison result stored in the extremum register circuit 14, the first data output port 1214 being for outputting the gated first data.
Specifically, the first logic signal input port 1211 may receive a high-level logic signal output from the data reading circuit 11, and may receive a low-level logic signal output from the data reading circuit 11. When the first logic signal input port 1211 receives the high-level logic signal input from the data reading circuit 11, the first selection unit 121 may gate the first maximum value input port 1213, receive one of the maximum value comparison results stored in the extremum register circuit 14, and input the received one of the maximum value comparison results to the multiplexing comparison tree circuit 13 through the first data output port 1214 as the first data for the multiplexing comparison tree circuit 13 to perform the comparison operation. Otherwise, if the first logic signal input port 1211 receives a low-level logic signal input from the scalar register array 112, the first selection unit 121 may gate the first data input port 1212, receive any one of the data stored in the data reading circuit 11, and input the received one of the data to the multiplexing comparison tree circuit 13 through the first data output port 1214 as the first data for the comparison operation performed by the multiplexing comparison tree circuit 13.
According to the data selection device provided by the embodiment, the selection circuit can be used for gating the multiplexing comparison tree circuit to receive four different data, and the multiplexing comparison tree circuit is used for performing multi-layer cyclic comparison on a plurality of data to obtain a final extreme value, so that multi-layer cyclic comparison processing can be performed on the plurality of data to obtain a maximum value and a minimum value, and the operand and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selecting apparatus shown in fig. 4, the data selecting apparatus includes the second selecting unit 122, and the second selecting unit 122 includes: a first logic signal input port 1221, a second data input port 1222, a recall second maximum input port 1223, and a second data output port 1224, the first logic signal input port 1221 for receiving a first logic signal, the second data input port 1222 for receiving incoming second data, the recall second maximum input port 1223 for receiving a maximum comparison result stored in the extremum register circuit 14, and the second data output port 1224 for outputting gated second data.
Specifically, the first logic signal input port 1221 may receive a high-level logic signal input from the data reading circuit 11, or may receive a low-level logic signal input from the data reading circuit 11. If the first logic signal input port 1221 receives the high-level logic signal input from the data read circuit 11, the second selection unit 122 may gate the second maximum value input port 1223, receive one of the maximum value comparison results stored in the extremum register circuit 14, and input the received one of the maximum value comparison results to the multiplexing comparison tree circuit 12 through the second data output port 1224, as the second data for comparison operation performed by the multiplexing comparison tree circuit 12. Otherwise, if the first logic signal input port 1221 receives a low-level logic signal input from the scalar register array 112, the second selection unit 122 may gate the second data input port 1222, receive any one of the data stored in the data read-in circuit 11, and input the received one of the data to the multiplexing comparison tree circuit 13 through the second data output port 1224, as the second data for comparison operation performed by the multiplexing comparison tree circuit 13.
According to the data selection device provided by the embodiment, the selection circuit can be used for gating the multiplexing comparison tree circuit to receive four different data, and the multiplexing comparison tree circuit is used for performing multi-layer cyclic comparison on a plurality of data to obtain a final extreme value, so that multi-layer cyclic comparison processing can be performed on the plurality of data to obtain a maximum value and a minimum value, and the operand and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selecting apparatus shown in fig. 4, the data selecting apparatus includes the third selecting unit 123, and the third selecting unit 123 includes: the second logic signal input port 1231 and the third floating-point number input port 1232 call a third minimum input port 1233 and a third data output port 1234, the second logic signal input port 1231 is configured to receive a second logic signal, the third data input port 1232 is configured to receive input third data, the third maximum input port 1233 is configured to receive a minimum comparison result stored in the extremum register circuit 14, and the third data output port 1234 is configured to output gated third data.
Specifically, the second logic signal input port 1231 may receive a high-level logic signal input from the data reading circuit 11, or may receive a low-level logic signal input from the data reading circuit 11. If the second logic signal input port 1231 receives a high-level logic signal input by the data reading circuit 11, the third selection unit 123 may gate and call the third minimum input port 1233, receive one minimum comparison result stored in the extremum register circuit 14, and input the received one minimum comparison result to the multiplexing comparison tree circuit 12 through the third data output port 1234 as third data for performing comparison operation by the multiplexing comparison tree circuit 12. Otherwise, if the second logic signal input port 1231 receives a low-level logic signal input by the scalar register array 112, the third selection unit 123 may gate the third data input port 1232, receive any one of the data stored in the data reading circuit 11, and input the received one of the data into the multiplexing comparison tree circuit 13 through the third data output port 1234 as third data for the multiplexing comparison tree circuit 13 to perform the comparison operation.
According to the data selection device provided by the embodiment, the multiplexing comparison tree circuit can be gated through the selection circuit to receive four different data, and the multiplexing comparison tree circuit is used for carrying out multi-layer cyclic comparison on a plurality of data to obtain a final extreme value, so that multi-layer cyclic comparison processing can be carried out on the plurality of data to obtain a maximum value and a minimum value, and the operand and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, a specific structural diagram of the data selecting apparatus shown in fig. 4 is shown, where the data selecting apparatus includes the fourth selecting unit 124, and the fourth selecting unit 124 includes: the second logic signal input port 1241 and the fourth data input port 1242 call a fourth minimum value input port 1243 and a fourth data output port 1244, the second logic signal input port 1241 is configured to receive a second logic signal, the fourth data input port 1242 is configured to receive input fourth data, the fourth minimum value input port 1243 is configured to receive a minimum value comparison result stored in the extremum register circuit 14, and the fourth data output port 1244 is configured to output gated fourth data.
Specifically, the second logic signal input port 1241 may receive a high-level logic signal input from the data reading circuit 11, or may receive a low-level logic signal input from the data reading circuit 11. If it is firstIIWhen the logic signal input port 1241 receives a high-level logic signal input by the data read-in circuit 11, the fourth selection unit 124 may gate and call the fourth minimum input port 1243 to receive the logic signal stored in the extremum register circuit 14The stored one minimum comparison result is inputted to the multiplexing comparison tree circuit 12 through the fourth data output port 1244 as the fourth data for comparison operation of the multiplexing comparison tree circuit 12. Otherwise, if the second logic signal input port 1241 receives a low-level logic signal input by the scalar register array 112, the fourth selection unit 124 may gate the fourth data input port 1242, receive any one of the data stored in the data read circuit 11, and input the received one of the data into the multiplexing comparison tree circuit 13 through the fourth data output port 1244 as the fourth data for the multiplexing comparison tree circuit 13 to perform the comparison operation.
According to the data selection device provided by the embodiment, the multiplexing comparison tree circuit can be gated through the selection circuit to receive four different data, and the multiplexing comparison tree circuit is used for carrying out multi-layer cyclic comparison on a plurality of data to obtain a final extreme value, so that multi-layer cyclic comparison processing can be carried out on the plurality of data to obtain a maximum value and a minimum value, and the operand and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
Another embodiment provides another multiplexing comparison tree circuit, where the multiplexing comparison tree circuit 13 includes: the first-stage multiplexing comparator 131 is configured to compare two data to obtain an extreme value, and the second-stage multiplexing comparator 132 is configured to compare two data to obtain an extreme value.
Specifically, each of the first-stage multiplexing comparator 131 and the second-stage multiplexing comparator 132 may perform a circular comparison operation on a plurality of data. Optionally, the first-stage multiplexing comparator 131 may include a plurality of multiplexing comparators, and the second-stage multiplexing comparator 132 may also include a plurality of multiplexing comparators, wherein each of the multiplexing comparators in the first-stage multiplexing comparator 131 and the second-stage multiplexing comparator 132 receives the same function selection mode signal. Optionally, the first-stage multiplexing comparator 131 may perform a first-stage circular comparison operation on all data gated by the selection circuit 12, and a result obtained by each comparison operation may be input into the extremum register circuit 14 for storage, where the extremum register circuit 14 may perform corresponding numbering on each layer of circular comparison results. Illustratively, the number of the first layer loop comparison result is corresponding to 1, the number of the second layer loop comparison result is corresponding to 2, and the two layers are numbered sequentially until the number of the last layer (i.e., M layers) loop comparison result is corresponding to M. In addition, in each comparison operation of the first-level cyclic comparison operation, the first-level multiplexing comparator 141 may receive the two data gated by the first selection unit 121 and the second selection unit 122 for comparison, obtain a maximum value and a minimum value of the two data, and store the maximum value and the minimum value in the extremum register circuit 14, if the data gated by the first selection unit 121 and/or the second selection unit 122 is input through the scalar register array 112, the storage interval of the one or two data stored in the scalar register array 112 is automatically cleared, and a process of a next comparison operation is the same as that of a previous comparison operation, which is not described herein again. When all the data stored in the scalar register array 112 is selected, the first-stage multiplexer comparator 131 ends the first-stage circular comparison operation. Optionally, the number of layers corresponding to the maximum comparison result obtained by the cyclic comparison operation of each layer of the multiplexing comparison tree circuit 13 may be equal to the sum of the total number of layers of the cyclic comparison operation performed by the current first-stage multiplexing comparator 131 and the current second-stage multiplexing comparator 132. Optionally, if the second-stage multiplexing comparator 132 performs the first-layer cyclic comparison operation, the cyclic comparison result obtained by the first-layer cyclic comparison operation is the cyclic comparison result of the second layer of the multiplexing comparison tree circuit 13. Optionally, the number of layers corresponding to the minimum comparison result obtained by comparing each layer of the multiplexing comparison tree circuit 13 in a cyclic manner may be equal to the sum of the total number of layers currently subjected to the cyclic comparison operation by the first-stage multiplexing comparator 131 and the second-stage multiplexing comparator 132.
It should be noted that after each comparison operation of the first-level cyclic comparison operation, the first-level multiplexing comparator 131 can obtain an extremum value to store in the extremum register circuit 14, and, starting from the first comparison operation, the first stage multiplexing comparator 131 successively performs two comparison operations, two extreme values, both of which may be referred to as first tier cyclic comparison results, the corresponding numbers are both 1, and at this time, the second-stage multiplexing comparator 132 can read the two first-stage loop comparison results stored in the extremum register circuit 14 to perform the second-stage loop comparison operation, that is, as long as the first-stage loop comparison result stored in the extremum register circuit 14 is two data, the second-stage multiplexing comparator 132 can automatically read the two first-stage loop comparison results stored in the extremum register circuit 14 to perform the second-stage loop comparison operation. Optionally, the first-layer cyclic comparison result processed by the second-level multiplexing comparator 132 may be referred to as a second-layer cyclic comparison operation, the result obtained by the second-layer cyclic comparison operation may be referred to as a second-layer cyclic comparison result, the second-layer cyclic comparison result may still be stored in the extremum register circuit 14, and multiple-layer cyclic comparison operations are performed in sequence, and the next-layer cyclic comparison operation may process the previous-layer cyclic comparison result, but from the start of the second-layer cyclic comparison operation, after each layer of cyclic comparison operation is required to be completed, all results of the layer of operation may be stored in the extremum register circuit 14, and the next-layer cyclic comparison operation may be performed by the first-level multiplexing comparator 131 or the second-level multiplexing comparator 132 until the result of the certain layer of cyclic comparison operation is one datum, and the multiple-layer cyclic comparison operation is completed. Optionally, the first-stage multiplexing comparator 131 and the second-stage multiplexing comparator 132 may alternately perform multiple layers of cyclic comparison operations, where the first-stage multiplexing comparator 131 may perform even-level cyclic comparison operations, and the second-stage multiplexing comparator 132 may perform odd-level cyclic comparison operations.
According to the data selection device provided by the embodiment, the multiplexing comparison tree circuit can perform multi-layer cyclic comparison on the gated data to obtain a final extreme value, so that multi-layer cyclic comparison processing can be performed on the data to obtain a maximum value and a minimum value, and the computation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 4, the data selection apparatus includes the first-stage multiplexing comparator 131, and the first-stage multiplexing comparator 131 includes: a first multiplexing comparator 1311 and a second multiplexing comparator 1312, wherein the multiplexing comparator 1311 is configured to perform a cyclic comparison operation on the first data gated by the selection circuit 12 to obtain a maximum value vector and a minimum value vector, and the second multiplexing comparator 1312 is configured to perform a cyclic comparison operation on the second data gated by the selection circuit 12 to obtain a maximum value vector and a minimum value vector.
Specifically, the first multiplexing comparator 1311 and the second multiplexing comparator 1312 may each perform a first-layer cyclic comparison operation, and each time may perform a comparison operation on two data to obtain a maximum value and a minimum value of the two data. It should be noted that, if the number of data received by the data reading circuit 11 is N, and N is an even number (multiple of 4), the number of times that the first multiplexing comparator 1311 and the second multiplexing comparator 1312 can perform the first layer of round comparison operation is equal to N/4, and if N is an odd number, the number of times that the first multiplexing comparator 1311 and the second multiplexing comparator 1312 can perform the round comparison operation is equal to round (N/4), and round (may represent rounding of a real number).
In the data selection device provided by this embodiment, the first multiplexing comparator and the second multiplexing comparator can perform first-level cyclic comparison operation on gated data to obtain an extreme value, and then perform multi-level cyclic comparison processing on the gated data by the first multiplexing comparator or the second multiplexing comparator to obtain a final maximum value and a final minimum value, so that the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 4, the data selection apparatus includes the first multiplexing comparator 1311, and the first multiplexing comparator 1311 includes: the function selection Mode signal input port (Mode)1311a, the first data input port 1311b, the second data input port 1311c, the maximum value first output port 1311d, and the minimum value first output port 1311e, where the function selection Mode signal input port (Mode)1311a is configured to receive a function selection Mode signal corresponding to data with different bit widths to be processed, the first data input port 1311b is configured to receive first data input by the selection circuit 12, the first data input port 1311c is configured to receive first data input by the selection circuit 12, the maximum value first output port 1311d is configured to output a maximum value obtained by comparison operation, and the minimum value first output port 1311e is configured to output a minimum value obtained by comparison operation.
Specifically, the first multiplexing comparator 1311 may perform a comparison operation on the first data input port 1311b and the second data input port 1311c, the received first data and second data input by the selection circuit 12 each time, output and store the obtained maximum value to the maximum value register circuit through the maximum value first output port 1311d, and output and store the obtained minimum value to the maximum value register circuit through the minimum value first output port 1311 e.
In the data selection device provided by this embodiment, the first multiplexing comparator and the second multiplexing comparator can perform first-level cyclic comparison operation on gated data to obtain an extreme value, and then perform multi-level cyclic comparison processing on the gated data by the first multiplexing comparator or the second multiplexing comparator to obtain a final maximum value and a final minimum value, so that the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 4, the data selection apparatus includes the second multiplexing comparator 1312, and the second multiplexing comparator 1312 includes: the function selection Mode signal input port (Mode)1312a, the third data input port 1312b, the fourth data input port 1312c, the maximum value second output port 1312d and the minimum value second output port 1312e, wherein the function selection Mode signal input port (Mode)1312a is used for receiving a function selection Mode signal corresponding to data with different bit widths to be processed, the third data input port 1312b is used for receiving third data input by the selection circuit 12, the fourth data input port 1312c is used for receiving fourth data input by the selection circuit 12, the maximum value second output port 1312d is used for outputting a maximum value obtained by comparison operation, and the minimum value second output port 1312e is used for outputting a minimum value obtained by comparison operation.
Specifically, each time the second multiplexer comparator 1312 may perform a comparison operation on the second data input port 1312b and the second data input port 1312c received from the selection circuit 12, output and store the obtained maximum value into the maximum value register circuit through the maximum value second output port 1312d, and output and store the obtained minimum value into the maximum value register circuit through the minimum value second output port 1312 e.
In the data selection device provided by this embodiment, the first multiplexing comparator and the second multiplexing comparator can perform first-level cyclic comparison operation on gated data to obtain an extreme value, and then perform multi-level cyclic comparison processing on the gated data by the first multiplexing comparator or the second multiplexing comparator to obtain a final maximum value and a final minimum value, thereby effectively reducing the operation amount and the delay inside the data selection device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 4, the data selection apparatus includes the second-stage multiplexing comparator 132, and the second-stage multiplexing comparator 132 includes: a third multiplexing comparator 1321 and a fourth multiplexing comparator 1322, where the third multiplexing comparator 1321 is configured to compare two data to obtain a maximum value, and the fourth multiplexing comparator 1322 is configured to compare two floating-point numbers to obtain a minimum value.
It should be noted that, the third multiplexing comparator 1321 and the fourth multiplexing comparator 1322 can perform multi-layer cyclic comparison operation, the comparison result of each layer of cyclic comparison operation can be stored in the extremum register circuit 14, and the extremum register circuit 14 has a corresponding number for each layer of cyclic comparison result. Optionally, the number of layers of the cyclic comparison operation performed by the third multiplexing comparator 1321 and the number of layers of the cyclic comparison operation performed by the fourth multiplexing comparator 1322 may be equal to or not equal to each other, and in addition, the total number of times of the comparison operation of each layer may be equal to each other.
According to the data selection device provided by the embodiment, the multiplexing comparison tree circuit can perform multi-layer cyclic comparison operation on a plurality of gated data to obtain a final extreme value, so that the multi-layer cyclic comparison processing can be performed on the plurality of data to obtain a maximum value and a minimum value, and the operation amount and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection apparatus shown in fig. 4, the data selection apparatus includes the third multiplexing comparator 1321, and the third multiplexing comparator 1321 includes: the function selection Mode signal input port (Mode)1321a is configured to invoke a maximum first input port 1321b, invoke a maximum second input port 1321c, and invoke a maximum third output port 1321d, the function selection Mode signal input port (Mode)1321a is configured to receive a function selection Mode signal corresponding to data with different bit widths that needs to be processed, the invoke a maximum first input port 1321b is configured to read a first maximum comparison result stored in the extremum register circuit 14, the invoke a maximum second input port 1321c is configured to read a second maximum comparison result stored in the extremum register circuit 14, and the maximum third output port 1321d is configured to output a maximum obtained by the comparison operation.
Specifically, the third multiplexing comparator 1321 may read the two maximum value comparison results stored in the extremum memory circuit 14 by calling the maximum value first input port 1321b and calling the maximum value second input port 1321c each time, perform a comparison operation, and output and store the obtained maximum values into the extremum register circuit 14 through the maximum value third output port 1321 d.
According to the data selection device provided by the embodiment, multilayer cyclic comparison processing is performed through the third-stage multiplexing comparator to obtain a final maximum value and a final minimum value, so that the computation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selecting apparatus shown in fig. 4, the data selecting apparatus includes the fourth multiplexing comparator 1322, where the fourth multiplexing comparator 1322 includes: the function selection Mode signal input port (Mode)1322a, the minimum first input port 1322b, the minimum second input port 1322c, and the minimum fourth output port 1322d are called, where the function selection Mode signal input port (Mode)1322a is configured to receive a function selection Mode signal corresponding to data with different bit widths that needs to be processed, the minimum first input port 1322b is configured to read a first minimum comparison result stored in the extremum register circuit 14, the minimum second input port 1322c is configured to read a second minimum comparison result stored in the extremum register circuit 14, and the minimum third output port 1322d is configured to output a minimum obtained by comparison operation.
Specifically, each time the fourth multiplexing comparator 1322 reads the two minimum value comparison results stored in the extremum memory circuit 14 through the minimum value first input port 1322b and the minimum value second input port 1322c, and performs comparison operation, and the obtained minimum value is output and stored in the extremum register circuit 14 through the minimum value fourth output port 1322 d.
Optionally, the circuit structures of the first multiplexing comparator 1311, the second multiplexing comparator 1312, the third multiplexing comparator 1321 and the fourth multiplexing comparator 1322 may be equal, and the circuit structure diagram is shown in fig. 5.
According to the data selection device provided by the embodiment, multilayer cyclic comparison processing is performed through the fourth-stage multiplexing comparator to obtain a final maximum value and a final minimum value, so that the computation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the detailed structural schematic diagram of the data selection apparatus shown in fig. 4, the data selection apparatus includes the extremum register circuit 14, and the extremum register circuit 14 includes: the system comprises a maximum value register file 141 and a minimum value register file 142, wherein the maximum value register file 141 is used for storing a maximum value obtained by multi-layer circulation comparison operation, and the minimum value register file 142 is used for storing a minimum value obtained by multi-layer circulation comparison operation.
It should be noted that the maximum value register file 141 may store the maximum value obtained by the first-stage multiplexing comparator 131 through each layer of the cyclic comparison operation, and may also store the maximum value obtained by the second-stage multiplexing comparator 132 through each layer of the cyclic comparison operation. Optionally, the number of storage sections in the maximum register file 141 may be set according to user needs, and in addition, the result of the multi-layer cyclic comparison operation may be stored in the maximum register file 141. For example, if the number of data received by the data reading circuit 11 is N, the total number of layers of the cyclic comparison operation may be equal to log2N, and the result of the log 2N-layer cyclic comparison operation may be stored in the maximum register file 141.
Optionally, the minimum register file 141 may store a minimum value obtained by each comparison operation of the first-stage multiplexing comparator 131, and may also store a minimum value obtained by each layer of cyclic comparison operations of the second-stage multiplexing comparator 132. Optionally, the number of storage intervals in the minimum register file 142 may be set according to user requirements, and in addition, the result of the multi-layer cyclic comparison operation may be stored in the minimum register file 142. For example, if the number of data received by the data reading circuit 11 is N, the total number of layers of the cyclic comparison operation may be equal to log2N, and the result of log2N layers of the cyclic comparison operation may be stored in the minimum register file 142.
In the data selection device provided by this embodiment, the data selection device may perform multi-layer cyclic comparison on multiple data through the first-stage multiplexing comparator and the second-stage multiplexing comparator to obtain a final extreme value, so that the multiple data may be subjected to multi-layer cyclic comparison processing to obtain a maximum value and a minimum value therein, thereby effectively reducing the amount of operation and the delay inside the data selection device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural schematic diagram of the data selection device shown in fig. 4, the data selection device includes the maximum register file 141, and the maximum register file 141 includes: a remaining data input port 1410, a maximum first output port 1411, a maximum second output port 1412, a maximum third input port 1413, a first maximum output port 1414, a second maximum output port 1415, a maximum first input port 1416a, a maximum second input port 1416b, a maximum output port 1417, a comparison level output port 1418, and a determination result input port 1419; wherein the remaining data input port 1410 is configured to receive the remaining data stored in the register storage section during the comparison operation, the call maximum first output port 1411 is configured to output a first maximum comparison result, the call maximum second output port 1412 is configured to output a second maximum comparison result, the maximum third input port 1413 is configured to receive a maximum obtained by the comparison operation, the call first maximum output port 1414 is configured to output a maximum comparison result, the call second maximum output port 1415 is configured to output a second maximum comparison result, the maximum first input port 1416a is configured to receive a first maximum obtained by the comparison operation, the maximum second input port 1416b is configured to receive a second maximum obtained by the comparison operation, and the maximum output port 1417 is configured to output a final maximum obtained by the multi-layer circular comparison operation, the comparison level output port 1418 is configured to output the number of layers corresponding to the current cyclic comparison result, and the determination result input port 1419 is configured to receive the comparison result between the number of layers of the current maximum comparison result and the total number of layers that the multiplexing comparison tree circuit 13 needs to perform the cyclic comparison operation.
Specifically, when the third multiplexing comparator 133 performs the cyclic comparison operation, the third multiplexing comparator 133 can receive two different comparison results obtained by the previous layer of cyclic comparison operation through the maximum value first output port 1411 and the maximum value second output port 1412, after each comparison operation is finished, the maximum value third input port 1413 can receive the maximum value result output by the third multiplexing comparator 133, and after the multi-layer cyclic comparison operation is finished, the final maximum value is output through the maximum value output port 1417. Optionally, the comparison level output port 1418 may output the number of layers corresponding to the maximum comparison result stored in the current maximum register file 141. If the number of layers corresponding to the maximum comparison result stored in the current maximum register file 141 is equal to the total number of layers that the multiplexing comparison tree circuit 13 needs to perform the cyclic comparison operation, the determination result input port 1419 may receive the high-level logic signal input by the termination determination circuit 15, and at this time, the maximum output port 1417 may output the final comparison result. Otherwise, the determination result input port 1419 may receive the low-level logic signal input by the ending determination circuit 15, and at this time, the multiplexing comparison tree circuit 13 still needs to continue the comparison operation until the determination result input port 1419 receives the high-level logic signal, and the multi-level cyclic comparison operation is ended.
When one unprocessed data is stored in the scalar register array 112 after the first-level round-robin comparison operation is completed, the maximum register file 141 may receive the remaining data through the remaining data input port 1410, and perform the multi-level round-robin comparison operation on the data and the first-level round-robin comparison operation result through the multiplexing comparison tree circuit 13. Optionally, the maximum first input port 1416a may receive the first maximum obtained by the comparison operation of the first multiplexing comparator 1311, and the maximum second input port 1416b may receive the second maximum obtained by the comparison operation of the second multiplexing comparator 1312.
According to the data selection device provided by the embodiment, the comparison result obtained by each layer of cyclic comparison operation can be stored by the extreme value register circuit, so that the result of the previous layer of comparison operation can be directly called when the next layer of cyclic comparison operation is carried out, multiple layers of cyclic comparison processing are carried out on a plurality of data, the maximum value and the minimum value are obtained, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection device shown in fig. 4, the data selection device includes the minimum value register file 142, and the minimum value register file 142 includes: a residual data input port 1420, a minimum first output port 1421, a minimum second output port 1422, a minimum fourth input port 1423, a third minimum output port 1424, a fourth minimum output port 1425, a minimum first input port 1426a, a minimum second input port 1426b, a minimum output port 1427, a comparison level output port 1428, and a judgment result input port 1429; the remaining data input port 1420 is configured to receive the remaining data stored in the register storage interval during the comparison operation, the called minimum value first output port 1421 is configured to output a third minimum value comparison result, the called minimum value second output port 1422 is configured to output a fourth minimum value comparison result, the minimum value fourth input port 1423 is configured to receive a minimum value obtained by the comparison operation, the called third minimum value output port 1424 is configured to output a third minimum value comparison result, the called fourth minimum value output port 1425 is configured to output a fourth minimum value comparison result, the minimum value first input port 1426a is configured to receive a first minimum value obtained by the comparison operation, the minimum value second input port 1426b is configured to receive a second minimum value obtained by the comparison operation, and the minimum value output port 1427 is configured to output a final minimum value obtained by the multi-layer cyclic comparison operation, the comparison level output port 1428 is configured to output the number of layers corresponding to the current cyclic comparison result, and the determination result input port 1429 is configured to receive the comparison result between the number of layers of the current minimum comparison result and the total number of layers that the multiplexing comparison tree circuit 13 needs to perform cyclic comparison operation.
Specifically, when the fourth multiplexing comparator 134 performs the cyclic comparison operation, the fourth multiplexing comparator 134 may receive two different comparison results obtained by the previous layer of cyclic comparison operation through calling the minimum value first output port 1421 and calling the minimum value second output port 1422, after each comparison operation is completed, the minimum value fourth input port 1423 may receive the minimum value result output by the fourth multiplexing comparator 134, and after the multi-layer cyclic comparison operation is completed, the final minimum value is output through the minimum value output port 1427. Optionally, the comparison level output port 1428 may output the number of layers corresponding to the minimum comparison result stored in the current minimum register file 142. If the number of layers corresponding to the minimum comparison result stored in the current minimum register file 142 is equal to the total number of layers that the multiplexing comparison tree circuit 13 needs to perform the circular comparison operation, the determination result input port 1429 may receive the high-level logic signal input by the end determination circuit 15, and at this time, the maximum output port 1427 may output the final comparison result. Otherwise, the determination result input port 1429 may receive the low-level logic signal input by the ending determination circuit 15, at this time, the multiplexing comparison tree circuit 13 needs to continue the comparison operation until the determination result input port 1429 receives the high-level logic signal, and the multi-layer cyclic comparison operation ends.
When one unprocessed data is stored in the scalar register array 112 after the first-stage loop compare operation is completed, the local maximum register file 141 may receive the remaining data through the remaining data input port 1420, and perform the multi-stage loop compare operation on the data and the first-stage loop compare operation result by the multiplexing compare tree circuit 13. Optionally, the minimum first input port 1426a may receive a first minimum obtained by the comparison operation of the first multiplexing comparator 1311, and the minimum second input port 1426b may receive a second minimum obtained by the comparison operation of the second multiplexing comparator 1312.
According to the data selection device provided by the embodiment, the comparison result obtained by each layer of cyclic comparison operation can be stored by the extreme value register circuit, so that the result of the previous layer of comparison operation can be directly called when the next layer of cyclic comparison operation is carried out, multiple layers of cyclic comparison processing are carried out on a plurality of data, the maximum value and the minimum value are obtained, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selection device shown in fig. 4, the data selection device includes the ending judgment circuit 15, where the ending judgment circuit 15 includes: a determining unit 151, where the determining unit 151 is configured to determine the number of layers of the current extremum comparison result and the total number of layers that the multiplexing comparison tree circuit 13 needs to perform the cyclic comparison operation to obtain the final extremum.
It should be noted that, if the number of layers of the current extremum comparison result is equal to the total number of layers of the final extremum multiplexing comparison tree circuit 13 that needs to perform the cyclic comparison operation, the determination result of the determining unit 141 may be that the multi-layer cyclic comparison operation is ended, and the extremum in the multiple data is output, in this case, the multiplexing comparison tree circuit 13 does not need to continue the cyclic comparison operation. Optionally, the extreme value comparison result may be a maximum value comparison result, and may also be a minimum value comparison result.
According to the data selection device provided by the embodiment, the data selection device can perform multi-layer cyclic comparison processing on a plurality of data through the first-stage multiplexing comparator and the second-stage multiplexing comparator, judge whether cyclic comparison operation is finished through the judgment unit, and if the judgment result of the judgment unit is yes, finish the cyclic comparison operation and output the operation result, so that the multi-layer cyclic comparison processing can be performed on the plurality of data, a maximum value and a minimum value are obtained, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In one embodiment, continuing with the specific structural diagram of the data selecting apparatus shown in fig. 4, the data selecting apparatus includes the determining unit 151, and the determining unit 151 includes: a comparison hierarchy input port 1511 and a judgment result output port 1512, where the comparison hierarchy input port 1511 is configured to receive the number of layers corresponding to the cyclic comparison result currently obtained by the extremum register circuit 14, and the judgment result output port 1512 is configured to output the number of layers corresponding to the current extremum comparison result and a comparison result of the total number of layers that the multiplexing comparison tree circuit 13 needs to perform the cyclic comparison operation.
It should be noted that, when the number of layers corresponding to the current extremum comparison result is equal to the total number of layers that the multiplexing comparison tree circuit 13 needs to perform the circular comparison operation, the determining unit 151 may input a high level signal to the maximum register file 141 and the minimum register file 142 through the determination result output port 1512, and instruct the maximum register file 141 and the minimum register file 142 to output the operation results respectively.
According to the data selection device provided by the embodiment, the data selection device can perform multi-layer cyclic comparison processing on a plurality of data through the first-stage multiplexing comparator and the second-stage multiplexing comparator, judge whether the multi-layer cyclic comparison operation is finished through the judgment unit, if the judgment result of the judgment unit is yes, finish the multi-layer cyclic comparison operation and output the operation result, so that the multi-layer cyclic comparison processing can be performed on the plurality of data, the maximum value and the minimum value are obtained, and the operation amount and the delay inside the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
Fig. 6 is a schematic flow chart of a data processing method according to an embodiment, which can be processed by the data selecting apparatus shown in fig. 3, and this embodiment relates to a process of performing multi-layer cyclic comparison operation on a plurality of data to select an extremum. As shown in fig. 6, the method includes:
s101, receiving data to be processed.
Specifically, the data selection device may receive N pieces of data to be processed through the data read-in circuit, and the data selection device may also receive different function selection mode signals through the multiplexing comparison tree circuit. Optionally, the data may be floating point numbers. Optionally, the number N of the data to be processed received by the data reading circuit may be greater than 2, that is, the number N of the data to be processed read by the data reading circuit may be greater than 2.
It should be noted that, if the multiplexing comparison tree circuit receives different function selection mode signals, it indicates that the multiplexing comparison tree circuit can perform comparison operation on data with different corresponding bit widths, and meanwhile, the correspondence between the different function selection mode signals and the multiplexing comparison tree circuit that can process data with different bit widths can be flexibly set, which is not limited in this embodiment. For example, if the multiplexing comparison tree circuit can receive three function selection Mode signals, each of which is represented by Mode ═ 0, Mode ═ 1, and Mode ═ 2, Mode ═ 0 may indicate that the multiplexing comparison tree circuit can process 16-bit floating points, Mode ═ 1 may indicate that the multiplexing comparison tree circuit can process 32-bit floating points, Mode ═ 2 may indicate that the multiplexing comparison tree circuit can process 64-bit floating points, Mode ═ 0 may indicate that the multiplexing comparison tree circuit can process 32-bit floating points, Mode ═ 1 may also indicate that the multiplexing comparison tree circuit can process 64-bit floating points, and Mode ═ 2 may also indicate that the multiplexing comparison tree circuit can process 16-bit floating points.
It should be noted that, if the bit width of the data to be processed received by the data reading circuit is not equal to the processable data bit width corresponding to the function selection mode signal received by the multiplexing comparison tree circuit, the multiplexing comparison tree circuit divides the received data to be processed into multiple groups of data having the same bit width as the currently processable data of the multiplexing comparison tree circuit according to the bit width of the data currently processable by the multiplexing comparison tree circuit, and performs parallel processing on the multiple groups of data, where the bit width of the data to be processed received by the data reading circuit may be greater than the bit width of the data currently processable by the multiplexing comparison tree circuit. Alternatively, the parallel processing may be characterized in that the divided data to be processed of each group are processed simultaneously. If the bit width of the data to be processed received by the data reading circuit is equal to the bit width of the processable data corresponding to the function selection mode signal received by the multiplexing comparison tree circuit, the multiplexing comparison tree circuit can directly process the received data to be processed.
S102, performing multi-layer cyclic comparison operation on the data to be processed through a multiplexing comparison tree circuit.
It should be noted that, during each comparison operation, the multiplexing comparison tree circuit may compare two data to obtain an extremum value in the two data, and during each comparison operation, the multiplexing comparison tree circuit may receive two data to be processed input by the data reading circuit.
And S103, judging whether the condition for finishing the multilayer circulation comparison operation is met or not through a finishing judgment circuit.
Specifically, after each comparison operation is finished, whether the condition for finishing the multi-layer cyclic comparison operation is currently met can be judged through the finishing judging circuit.
And S104, outputting a vector extreme value if the condition for finishing the multilayer cyclic comparison operation is met.
Specifically, if the end judgment circuit judges that the comparison operation is ended, the multi-layer cyclic comparison operation can be ended, the comparison operation is stopped, and the final vector extremum is output through the extremum register circuit. Optionally, the extreme value of the vector may be characterized as an extreme value in all the data to be processed received by the data reading circuit.
In the data processing method provided by this embodiment, to-be-processed data is received, the to-be-processed data is input to a multiplexing comparison tree circuit, the to-be-processed data is cyclically compared through the multiplexing comparison tree circuit, whether a condition for ending multi-layer cyclic comparison operation is met or not is judged through an ending judgment circuit, and if the condition for ending comparison operation is met, a vector extreme value is output, in the process, extreme values in multiple to-be-processed data can be obtained through multi-layer cyclic comparison operation, so that the operation amount and the delay in a data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
Fig. 7 is a schematic flowchart of a data processing method according to another embodiment, where after the step of receiving data to be processed, the method further includes:
s105, receiving the number N of the data to be processed and the first addresses of a plurality of storage intervals in the register through a data reading unit.
S106, reading the data according to the first addresses of the storage intervals in the register and the number N of the data, and storing the data into a scalar register array.
Specifically, the scalar register array may store the to-be-processed data read by the data reading unit into the plurality of storage sections in the scalar register array in sequence according to the first addresses of the plurality of storage sections in the register. Alternatively, the scalar register array may store one data to be processed at a time. Optionally, each storage interval may store one piece of data to be processed. Optionally, the number of the storage intervals may be equal to or greater than the number N of the received data to be processed.
Illustratively, if the storage interval corresponding to the head address of the data to be processed is a [0], and the data reading circuit reads in three 16-bit floating point numbers, which are 1011110000000100 → 1, 1011110011000100 → 2, 1011110001010100 → 3, respectively, the scalar register array may have three register storage intervals to store data, and when the 1 st data is stored in the head address corresponding to the storage interval a [0], the 2 nd data may be stored in the storage address corresponding to the next storage interval (i.e., a [1]), and the 3 rd floating point number may be stored continuously in the storage address corresponding to the next storage interval (i.e., a [2 ]).
In the data processing method provided by this embodiment, the data reading unit receives the number of data to be processed at the first address of the data storage interval to be processed, the scalar register array receives the data to be processed input by the data reading unit, the scalar register array sequentially stores the received data to be processed into the storage interval according to the first address of the storage interval, so that the data for comparison operation is input to the multiplexing comparison tree circuit during subsequent cyclic comparison operation, each time the scalar register array performs comparison operation on two data outputs, and the cyclic operation is continued until the multilayer cyclic comparison is finished, which can effectively reduce the operation amount and the delay in the data selection device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
Fig. 8 is a schematic flow chart of a data processing method according to another embodiment, where the performing a multi-level cyclic comparison operation on the data to be processed by the multiplexing comparison tree circuit in S102 includes:
and S1021, performing first-layer cyclic comparison operation on the data to be processed through the first-stage multiplexing comparator to obtain a first-layer extreme value comparison result.
It should be noted that the data selection device may input floating point numbers to be processed stored in the scalar register array into the first-stage multiplexing comparator, the scalar register array may input any two different pieces of data to be processed to the first-stage multiplexing comparator every time the first layer performs a comparison operation, and the scalar register array may input another two different floating point numbers to be processed to the first-stage multiplexing comparator next time the layer performs a comparison operation. Optionally, the first-stage multiplexing comparator may perform comparison operation on the two pieces of data to be processed to obtain a maximum value and a minimum value of the two pieces of data. Optionally, the total number of the first-layer loop comparison operations may be equal to 1/2 of the number of the data to be processed received by the data reading circuit. Optionally, the above first-stage multiplexing comparator may perform a cyclic comparison operation on all data to be processed, which may be referred to as a first-layer comparison operation, and after the first-layer comparison operation, a first-layer extreme value comparison result may be obtained, where the first-layer extreme value comparison result may include a first-layer maximum value comparison result and a first-layer minimum value comparison result. Optionally, the first stage multiplexing comparator may perform a first layer of circular comparison operations.
And S1022, performing multi-layer cyclic comparison operation on the first-layer extreme value comparison result through a second-stage multiplexing comparator.
Specifically, the comparison result obtained by performing the second-layer cyclic comparison operation on the first-layer extremum comparison result through the second-layer multiplexing comparator may be referred to as a second-layer extremum comparison result, then the second-layer multiplexing comparator sequentially performs cyclic operation, the next-layer cyclic comparison operation performs comparison operation on the previous-layer extremum comparison result until the last-layer extremum comparison result is a datum, and the multi-layer cyclic comparison operation is finished to obtain the final extremum comparison result. The total number of times of the loop comparison operation in each layer may be equal to 1/2 of the number of data in the extremum comparison result in the previous layer. Optionally, the total number of layers of the cyclic comparison operation performed by the second-stage multiplexing comparator plus one may be equal to the total number of layers of the cyclic comparison operation performed by the multiplexing comparison tree circuit.
According to the data processing method provided by the embodiment, the number of data to be processed is received through the data reading unit, the data to be processed input by the data reading unit is received through the scalar register array, the received data to be processed are sequentially stored into the storage interval by the scalar register array according to the first address of the storage interval, so that the data for comparison operation are input into the multiplexing comparator during subsequent cyclic comparison operation, each time the scalar register array can output two data for comparison operation, and the cyclic comparison is continuously performed until the cyclic comparison is finished, and the method can effectively reduce the operation amount and the delay inside the data selection device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
Fig. 9 is a schematic flowchart of a data processing method according to another embodiment, where the determining whether the condition for ending the multi-level loop comparison operation is satisfied by the ending determining circuit in S103 includes:
and S1031, acquiring the layer number corresponding to the extreme value comparison result obtained by the current comparison operation of the second-stage multiplexing comparator through the ending judgment circuit.
Specifically, each layer of cyclic comparison operation is performed through the second-stage multiplexing comparator, and the obtained extremum value comparison operation results all have corresponding numbers. Illustratively, the number corresponding to the comparison operation result obtained by the second layer of extreme value comparison operation is 2, the number corresponding to the comparison operation result obtained by the third layer of comparison operation is 3, and in turn, the number corresponding to the comparison operation result obtained by the last layer of comparison operation may be M, and if the number of data received by the data reading circuit is N, M may be equal to log 2N.
S1032, judging whether the multilayer cyclic comparison operation meets the condition of ending the multilayer cyclic comparison operation according to the number of layers of the current extreme value comparison result.
Specifically, the determining unit in the ending determining circuit may determine whether the multi-layer cyclic comparison operation satisfies the condition for ending the multi-layer cyclic comparison operation according to a size relationship between the number of layers corresponding to the extremum comparison result obtained by the current second-stage multiplexing comparator through the cyclic comparison operation and the number of layers required to be performed by the final extremum multiplexing comparison tree circuit. Optionally, the condition for ending the multi-layer cyclic comparison operation may be that the number of layers corresponding to the extremum value comparison result obtained by performing the cyclic comparison operation by the current second-stage multiplexing comparator is equal to the number of layers required to perform the cyclic comparison operation by the final extremum value multiplexing comparison tree circuit.
In the data processing method provided by this embodiment, the determining unit determines, according to a size relationship between the number of layers corresponding to the extremum comparison result obtained by the current second-stage multiplexing comparator through comparison operation and the number of layers required to perform cyclic comparison operation on the final extremum multiplexing comparison tree circuit, whether the multilayer cyclic comparison operation satisfies a condition for ending the multilayer cyclic comparison operation, and if so, the determining unit may input a high-level logic signal to the extremum register circuit to end the multilayer cyclic comparison operation to obtain an operation result, which may effectively reduce an amount of operation and a delay inside the data selecting device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
Another embodiment provides a data processing method, after the step S103 of determining whether a condition for ending the multi-level loop comparison operation is satisfied by the ending determination circuit, the method further includes: if not, the second-stage multiplexing comparator is used for continuously carrying out comparison operation on the extreme value comparison result obtained by the last-stage cyclic comparison operation until the extreme value comparison result of the last-stage cyclic comparison operation is a datum, and the operation is finished to output the vector extreme value.
Specifically, if the end judgment circuit judges that the condition for ending the multi-layer cyclic comparison operation is not satisfied after the second-stage multiplexing comparator finishes the current-layer cyclic comparison operation, the data selection device may continue the cyclic comparison operation on the extremum comparison result obtained by the previous-layer comparison operation through the second-stage multiplexing comparator. Optionally, the number of layers for continuing the circular comparison operation may be equal to 1, or may be equal to other positive integers.
In the data processing method provided by this embodiment, the determining unit determines, according to a size relationship between the number of layers corresponding to the extremum comparison result obtained by the comparison operation of the second-stage multiplexing comparator and the number of layers required to perform the cyclic comparison operation on the final extremum multiplexing comparison tree circuit, whether the cyclic comparison operation of the multiple layers satisfies a condition for ending the cyclic comparison operation, if not, the determining unit may continue the cyclic comparison operation on the extremum comparison result obtained by the comparison operation of the previous layer through the second-stage multiplexing comparator until the condition for ending the cyclic comparison operation of the multiple layers is satisfied, and ends the operation and outputs the operation result, which may effectively reduce the amount of operations and the delay inside the data selecting device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
Another embodiment provides a flow chart of the data processing method, wherein if the condition for ending the multi-layer cyclic comparison operation is satisfied in the step S104, outputting a vector extremum value, including: and receiving the logic judgment signal input by the judgment unit through an extreme value register circuit, and outputting an operation result according to the logic judgment signal.
Specifically, the extremum register circuit may receive the high level logic determination signal input by the determining unit, and may also receive the low level logic determination signal input by the determining unit. If the extremum register circuit receives the low level logic judgment signal, it can indicate that the data selection device needs to continue to compare the comparison result of the previous layer by the second multiplexing comparator. If the extremum register circuit receives a high-level logic judgment signal, the operation can be ended, and a final comparison operation result is output.
In the data processing method provided by this embodiment, the high-level logic judgment signal input by the judgment unit is received, the extreme value register circuit outputs the operation result according to the high-level logic judgment signal, and the process can obtain the extreme value in the data to be processed through multi-layer cyclic comparison operation, thereby effectively reducing the operation amount and the delay inside the data selection device; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
Fig. 10 is a flowchart illustrating a data processing method according to an embodiment, which can be processed by the data selecting apparatus shown in fig. 2, where the embodiment relates to a process of performing multi-layer cyclic comparison operation on a plurality of data to select an extremum. As shown in fig. 10, the method includes:
s201, receiving data to be processed.
S202, gating the data to be processed through a selection circuit, and inputting the gated comparison data into a multiplexing comparison tree circuit.
Specifically, during the cyclic comparison operation, the selection circuit may gate the data to be processed, the gated data may be two pairs, one pair of gated data may be data stored in the data read-in circuit, and the other pair of gated data may be data stored in the extremum register circuit. If the selection circuit receives the high-level logic signal input by the data reading circuit, the selection circuit can gate the data stored in the extreme value register circuit and input any two data stored in the extreme value register circuit to the multiplexing comparison tree circuit, otherwise, the selection circuit can gate the data stored in the data reading circuit and input any two data stored in the data reading circuit to the multiplexing comparison tree circuit.
And S203, performing multi-layer cyclic comparison operation on the gated comparison data through a multiplexing comparison tree circuit.
It should be noted that, each time of the comparison operation, the multiplexing comparison tree circuit may compare the two comparison data to obtain an extremum value in the two comparison data, and each time of the comparison operation, the multiplexing comparison tree circuit may receive the two comparison data input by the selection circuit. Optionally, the comparison data may be data to be processed, or may also be data in a comparison result of an extremum in a previous layer.
And S204, judging whether the condition for finishing the multilayer circulation comparison operation is met through a finishing judging circuit.
Specifically, after each comparison operation is finished, whether the condition for finishing the multi-layer cyclic comparison operation is currently met can be judged through the finishing judgment circuit.
And S205, if the condition for finishing the multilayer circulation comparison operation is met, outputting a vector extreme value.
Specifically, if the end judgment circuit judges that the comparison operation is ended, the multi-layer cyclic comparison operation can be ended, the comparison operation is stopped, and the final vector extremum is output through the extremum register circuit. Optionally, the vector extremum may be characterized as an extremum in all floating point numbers to be processed received by the data reading circuit.
In the data processing method provided by the embodiment, to-be-processed data is received, the to-be-processed data is input into the multiplexing comparison tree circuit through the selection circuit, the comparison data is subjected to cyclic comparison processing through the multiplexing comparison tree circuit, whether the condition for finishing the multilayer cyclic comparison operation is met or not is judged through the finishing judgment circuit, if the condition for finishing the comparison operation is met, a vector extreme value is output, the extreme value in a plurality of to-be-processed data can be obtained through the multilayer cyclic comparison operation in the process, and the operation amount and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
In another embodiment of the data processing method, the gating the data to be processed by the selection circuit in S202, and inputting the gated comparison data into the multiplexing comparison tree circuit includes: and gating the data to be processed by the first selection unit, the second selection unit, the third selection unit and the fourth selection unit, and inputting the gated comparison data into the multiplexing comparison tree circuit.
Specifically, during the comparison operation, the first selection unit, the second selection unit, the third selection unit and the fourth selection unit may gate the data to be processed respectively, the data gated by each selection unit may be two pairs, the data gated by one pair may be data stored in the data reading circuit, and the data gated by the other pair may be data stored in the extremum register circuit. If the selection circuit receives the high-level logic signal input by the data reading circuit, the selection circuit can gate the data stored in the extreme value register circuit and input any two data stored in the extreme value register circuit to the multiplexing comparison tree circuit, otherwise, the selection circuit can gate the data stored in the data reading circuit and input any two data stored in the data reading circuit to the multiplexing comparison tree circuit.
According to the data processing method provided by the embodiment, the data to be processed is gated through the first selection unit, the second selection unit, the third selection unit and the fourth selection unit, and the gated comparison data is input into the multiplexing comparison tree circuit to perform multi-layer cyclic comparison operation; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, fig. 11 is a data processing method provided by another embodiment, where gating the data to be processed by the first selecting unit, the second selecting unit, the third selecting unit, and the fourth selecting unit, and inputting the gated comparison data into the multiplexing comparison tree circuit includes:
s301, obtaining first comparison data through gating of a first selection unit, and inputting the first comparison data to a first multiplexing comparator.
Specifically, the first selection unit may gate the first comparison data, and the first comparison data may be data stored in the extremum register circuit or data stored in the data reading circuit.
S302, second comparison data are obtained through gating of the second selection unit, and the second comparison data are input into the first multiplexing comparator.
Specifically, the second selection unit may gate the second comparison data, and the second comparison data may be data stored in the extremum register circuit or data stored in the data reading circuit.
And S303, gating through a third selection unit to obtain third comparison data, and inputting the third comparison data to a second multiplexing comparator.
Specifically, the third selection unit may gate the third comparison data, and the third comparison data may be data stored in the extremum register circuit or data stored in the data reading circuit.
S304, obtaining fourth comparison data through the fourth selection unit, and inputting the fourth comparison data to the second multiplexing comparator.
Specifically, the fourth selection unit may gate fourth comparison data, where the fourth comparison data may be data stored in the extremum register circuit or data stored in the data reading circuit. It should be noted that the gated comparison data only needs to be subjected to the cyclic comparison operation by the first multiplexing comparator or the second multiplexing comparator, and does not need to be subjected to the cyclic comparison operation by the third multiplexing comparator or the fourth multiplexing comparator.
According to the data processing method provided by the embodiment, the data to be processed is gated through the first selection unit, the second selection unit, the third selection unit and the fourth selection unit, and the gated comparison data is input into the multiplexing comparison tree circuit to perform multi-layer cyclic comparison operation; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
In one embodiment, fig. 12 is a data processing method provided by another embodiment, where in S203, performing a multi-layer cyclic comparison operation on the comparison data after gating through a multiplexing comparison tree circuit includes:
s2031, performing first-layer cyclic comparison operation on the first comparison data and the second comparison data through a first multiplexing comparator to obtain a first-layer extreme value comparison result.
It should be noted that, the first multiplexing comparator may perform first-layer cyclic comparison operation on the first comparison data and the second comparison data received after gating, so as to obtain a first-layer extremum comparison result. Optionally, the first-layer extreme value comparison result may include a first-layer maximum value comparison result, and may further include a first-layer minimum value comparison result.
S2032, performing first-layer cyclic comparison operation on the third comparison data and the fourth comparison data through a second multiplexing comparator to obtain a first-layer extreme value comparison result.
It should be noted that the second multiplexing comparator may perform the first-layer cyclic comparison operation on the third comparison data and the fourth comparison data received after the gating.
And S2033, performing second-layer cyclic comparison operation on the first-layer extreme value comparison result through the third multiplexing comparator and the fourth multiplexing comparator to obtain a second-layer extreme value comparison result.
Specifically, the third multiplexing comparator can perform second-layer circular comparison operation on the first-layer maximum comparison result to obtain a second-layer maximum comparison result, and the fourth multiplexing comparator can perform second-layer circular comparison operation on the first-layer minimum comparison result to obtain a second-layer minimum comparison result.
S2034, the first multiplexing comparator and the second multiplexing comparator, the third multiplexing comparator and the fourth multiplexing comparator are used for alternately carrying out multi-layer circulation comparison operation on the comparison result of the extremum value of the previous layer.
Specifically, the first layer of circular comparison operation can be carried out on the comparison data through the first multiplexing comparator and the second multiplexing comparator, the third multiplexing comparator and the fourth multiplexing comparator can perform the second layer of cyclic comparison operation on the first layer of extreme value comparison result obtained by the first layer of cyclic comparison operation, and then, carrying out third-layer cyclic comparison operation on the second-layer extreme value comparison result obtained by the second-layer cyclic comparison operation through the first multiplexing comparator and the second multiplexing comparator, then continuously carrying out third-layer cyclic comparison operation on the third-layer extreme value comparison result through the third multiplexing comparator and the fourth multiplexing comparator, carrying out fourth-layer cyclic comparison operation on the obtained third-layer extreme value comparison result, and sequentially and alternately carrying out multi-layer cyclic comparison operation on the last-layer extreme value comparison result through the first multiplexing comparator and the second multiplexing comparator/the third multiplexing comparator and the fourth multiplexing comparator.
In the data processing method provided by the embodiment, the first multiplexing comparator, the second multiplexing comparator, the third multiplexing comparator and the fourth multiplexing comparator are used for continuously performing multilayer cyclic comparison operation, and the process can obtain extreme values in a plurality of data to be processed through the multilayer cyclic comparison operation, so that the operation amount and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of the AI chip occupied by the data selection device is effectively reduced.
Another embodiment provides a data processing method, after the step S204 determines whether a condition for ending the multi-level loop comparison operation is satisfied by the ending determination circuit, the method further includes: if not, the first multiplexing comparator, the second multiplexing comparator, the third multiplexing comparator and the fourth multiplexing comparator are used for alternately carrying out multi-layer cyclic comparison operation on the comparison result of the extremum value of the previous layer.
Specifically, if the end judgment circuit judges that the condition for ending the multi-layer cyclic comparison operation is not satisfied after the first and second multiplexing comparators or the third and fourth multiplexing comparators end the local-layer cyclic comparison operation, the data selection device may continue to perform the cyclic comparison operation on the extremum value comparison result obtained by the previous-layer cyclic comparison operation through the first and second multiplexing comparators and the third and fourth multiplexing comparators. Optionally, the number of layers for continuing the circular comparison operation may be equal to 1, or may be equal to other positive integers.
In the data processing method provided by the embodiment, the first multiplexing comparator, the second multiplexing comparator, the third multiplexing comparator and the fourth multiplexing comparator are used for continuously performing multilayer cyclic comparison operation, and the process can obtain extreme values in a plurality of data to be processed through the multilayer cyclic comparison operation, so that the operation amount and the delay in the data selection device are effectively reduced; in addition, the data selection device can process various data comparison operations with different bit widths according to different function selection mode signals received by the multiplexing comparison tree circuit, and the area of an AI chip occupied by the data selection device is effectively reduced.
For the understanding of those skilled in the art, the data processing method provided by the present invention is described by taking an example that the data selection device can process 16-bit floating point numbers and the data selection device receives N32-bit floating point numbers, and the specific method includes:
s401, receiving the number N of floating point numbers and the first addresses A of a plurality of storage intervals in a register through a data reading circuit, and reading N floating point numbers (namely A (a) according to the first addresses A of the plurality of storage intervals in the register and the number N of the floating point numbers 1 ,a 2 ,…,a N ) Store to a storage interval;
s402, inputting floating point numbers stored in a plurality of storage intervals to a first-stage multiplexing comparator by a data reading circuit according to the first addresses A of the storage intervals in the register;
and S403, performing cyclic comparison operation on the N floating point numbers through the first-stage multiplexing comparator to obtain a first-layer extreme value comparison result, and storing the first-layer extreme value comparison result into the extreme value register circuit.
S404, carrying out multilayer circulation comparison operation on the extreme value comparison result stored in the extreme value register circuit through the second-stage multiplexing comparator.
And S405, outputting an operation result through the extremum register circuit after the multi-layer cyclic comparison is finished.
It should be noted that, during the comparison operation, the first-stage multiplexing comparator and the second-stage multiplexing comparator may perform a circular comparison on the received data corresponding to the high and low 16 bits of the floating point number (i.e., the 32-bit floating point number a may be divided into the high 16 bits a1 (a) 1 [31:16],a 2 [31:16],…,a N [31:16]) And a lower 16A 2 (a) 1 [15:0],a 2 [15:0],...,a N [15:0]));
Optionally, the operation result may be obtained by splicing vectors a1 and a2, that is, a max ={max(A1),max(A2)},A min Min (a1), min (a 2). Optionally, a maximum value A max The high-low 16-bit data can be high-low 16-bit data of the same 32-bit floating point number, and can also be spliced by high-low 16-bit data of different 32-bit floating point numbers. Optionally, minimum value A min The high-low 16-bit data can be high-low 16-bit data of the same 32-bit floating point number, and can also be spliced by high-low 16-bit data of different 32-bit floating point numbers.
In addition, if the two floating point numbers to be processed received by the first-stage multiplexing comparator and the second-stage multiplexing comparator are a and b, in the comparison operation process, the output port of the judgment result of each unit in the first-stage multiplexing comparator and the second-stage multiplexing comparator is two-bit valid, namely, the high level and the low level are both valid, and the specific comparison condition is shown in table 1:
TABLE 1
Figure BDA0001886576860000431
The execution process of S201 to S205 may specifically refer to the description of the above embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
In addition, if the data selection device can process 16-bit floating point numbers and the data selection device receives N16-bit floating point numbers, the specific method of the present invention is described as follows:
s501, passingThe data reading circuit receives the first addresses A of a plurality of storage sections in a register of the number N of floating point numbers, and reads N floating point numbers (namely B (B) according to the first addresses A of the plurality of storage sections in the register and the number N of the floating point numbers 1 ,b 2 ,...,b N ) Storing to a storage interval;
s502, the data reading circuit inputs floating point numbers stored in a plurality of storage intervals to the first-stage multiplexing comparator according to the first addresses A of the storage intervals in the register;
s503, performing cyclic comparison operation on the N floating point numbers through the first-stage multiplexing comparator to obtain a first-layer extreme value comparison result, and storing the first-layer extreme value comparison result into an extreme value register circuit.
S504, carrying out multilayer circulation comparison operation on the extreme value comparison result stored in the extreme value register circuit through the second-stage multiplexing comparator.
And S505, after the multi-layer circulation comparison is finished, outputting an operation result through the extremum register circuit.
Optionally, the operation result may be B max ={max(b 1 ,b 2 ,...,b N )},B min ={min(b 1 ,b 2 ,...,b N )}。
In addition, if the two floating point numbers to be processed received by the first-stage multiplexing comparator and the second-stage multiplexing comparator are a and b, in the comparison operation process, the output port of the judgment result of each unit in the first-stage multiplexing comparator and the second-stage multiplexing comparator is one-bit valid, that is, the high level or the low level is valid, and if the low level is valid, the specific comparison condition is shown in table 2:
TABLE 2
Figure BDA0001886576860000441
For the implementation process of S301 to S305, reference may be specifically made to the description of the foregoing embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
The embodiment of the application also provides a machine learning arithmetic device, which comprises one or more data selection devices mentioned in the application, and is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic, and transmitting the execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one data selection device is included, the data selection devices can be linked and transmit data through a specific structure, for example, the data selection devices are interconnected and transmit data through a PCIE bus, so as to support a larger-scale machine learning operation. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be a separate memory for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has higher compatibility and can be connected with various types of servers through PCIE interfaces.
The embodiment of the application also provides a combined processing device which comprises the machine learning arithmetic device, the universal interconnection interface and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 13 is a schematic view of a combined treatment apparatus.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices can cooperate with the machine learning calculation device to complete calculation tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device obtains the required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Alternatively, as shown in fig. 14, the configuration may further include a storage device, and the storage device is connected to the machine learning arithmetic device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing devices, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing devices.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
In some embodiments, a chip is also claimed, which includes the above machine learning arithmetic device or the combined processing device.
In some embodiments, a chip package structure is provided, which includes the chip.
In some embodiments, a board card is provided, which includes the chip packaging structure. As shown in fig. 15, fig. 15 provides a card that may include other kits in addition to the chip 389, including but not limited to: memory device 390, receiving device 391, and control device 392;
the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the chip through a bus. It is understood that each set of the memory cells may be DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of a clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And arranging a controller for controlling DDR in the chip, wherein the controller is used for controlling data transmission and data storage of each storage unit.
The receiving device is electrically connected with the chip in the chip packaging structure. The receiving device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so that data transfer is implemented. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the receiving device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the receiving apparatus.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). For example, the chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the chip.
In some embodiments, an electronic device is provided that includes the above board card.
The electronic device may be a data processor, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance instrument, a B ultrasonic instrument and/or an electrocardiograph.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of circuit combinations, but it should be understood by those skilled in the art that the present application is not limited by the described circuit combinations, because some circuits may be implemented in other ways or structures according to the present application. Furthermore, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments, and the devices and modules involved are not necessarily essential to the application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the present invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (23)

1. A data selection apparatus, characterized in that the data selection apparatus comprises: the system comprises a data read-in circuit, a multiplexing comparison tree circuit, an extreme value register circuit and an ending judgment circuit; the output end of the data read-in circuit is connected with the first input end of the multiplexing comparison tree circuit, the first output end of the multiplexing comparison tree circuit is connected with the first input end of the extreme value register circuit, the first output end of the extreme value register circuit is connected with the input end of the ending judgment circuit, the output end of the ending judgment circuit is connected with the second input end of the extreme value register circuit, and the second output end of the extreme value register circuit is connected with the second input end of the multiplexing comparison tree circuit;
the data reading circuit is used for receiving the number N of data and the initial addresses of a plurality of storage intervals in the register, reading the data according to the initial addresses of the plurality of storage intervals in the register and the number N of the data, the multiplexing comparison tree circuit is used for carrying out multi-layer cycle comparison on the size of the received plurality of data, the extreme value register circuit is used for storing an extreme value obtained by each layer of cycle comparison, and the ending judgment circuit is used for judging whether the multi-layer cycle comparison processing is ended or not;
the multiplexing comparison tree circuit comprises a function selection mode signal input end and a multiplexing comparison tree circuit, wherein the function selection mode signal input end is used for receiving an input function selection mode signal; the function selection mode signal is used to determine the bit width of the data processed by the data selector.
2. The data selection device of claim 1, wherein the data read-in circuit comprises: the output end of the data reading unit is connected with the input end of the scalar register array;
the data reading unit is used for receiving the number N of the data and the first addresses of a plurality of storage intervals in the register and reading in the data according to the first addresses of the plurality of storage intervals in the register and the number N of the data, and the scalar register array is used for storing the data read in by the data reading unit according to the addresses of the plurality of storage intervals in the register.
3. The data selection device according to claim 2, wherein the data reading unit in the data reading-in circuit includes: the data input port is used for reading in the data according to the initial addresses of a plurality of storage intervals in the register and the number N of the data, the data number and initial address input port is used for receiving the number N of the read-in data and the initial addresses of the plurality of storage intervals in the register, and the data output port is used for outputting the read-in data;
the scalar register array in the data read-in circuit includes: the data input port is used for receiving N data, the first data output port is used for outputting the data stored in each register storage interval during each comparison operation, the second data output port is used for outputting the data stored in each register storage interval, and the residual data output port is used for outputting the residual data stored in the register storage interval during the comparison operation.
4. The data selection apparatus of claim 1, wherein the multiplexing comparison tree circuit comprises: the device comprises a first-stage multiplexing comparator and a second-stage multiplexing comparator, wherein the first-stage multiplexing comparator is used for comparing two data to obtain an extreme value, and the second-stage multiplexing comparator is used for comparing the two data to obtain the extreme value.
5. The data selection apparatus of claim 4, wherein the first stage multiplexing comparator in the multiplexing comparison tree circuit comprises: the multiplexing comparator is used for performing cyclic comparison operation on the data stored in the storage interval of the register to obtain a maximum value vector and a minimum value vector; the second stage multiplexing comparator in the multiplexing compare tree circuit comprises: the device comprises a first multiplexing comparator and a second multiplexing comparator, wherein the first multiplexing comparator is used for comparing two data to obtain a maximum value, and the second multiplexing comparator is used for comparing the two data to obtain a minimum value.
6. The data selection device of claim 5, wherein the multiplexing comparator, the first multiplexing comparator or the second multiplexing comparator comprises: the data processing device comprises a function selection mode signal input port, a first data input port, a second data input port, a maximum output port and a minimum output port, wherein the function selection mode signal input port is used for receiving a function selection mode signal corresponding to data with different bit widths to be processed, the first data input port is used for receiving the input first data, the second data input port is used for receiving the input second data, the maximum output port is used for outputting a maximum value after each data comparison operation, and the minimum output port is used for outputting a minimum value after each data comparison operation.
7. The data selection apparatus of claim 1, wherein the extremum register circuit comprises: the system comprises a maximum register file and a minimum register file, wherein the maximum register file is used for storing a maximum value obtained by multilayer cyclic comparison operation, and the minimum register file is used for storing a minimum value obtained by multilayer cyclic comparison operation.
8. The data selection device of claim 7, wherein the maximum register file in the extremum register circuitry comprises: a first maximum output port, a second maximum output port, a first maximum input port, a third maximum output port, a comparison level output port, a judgment result input port, a remaining data input port, and a second maximum input port, wherein the first maximum output port is used for outputting a first maximum, the second maximum output port is used for outputting a second maximum, the first maximum input port is used for receiving a maximum obtained by next comparison operation, the third maximum output port is used for outputting a maximum of a plurality of data, the comparison level output port is used for outputting the number of layers currently subjected to the cyclic comparison operation by the multiplexing comparison tree circuit, the judgment result input port is used for receiving a logic judgment signal, the remaining data input port is used for receiving the remaining data stored in the register storage section during the comparison operation, the second maximum input port is used for receiving a maximum value obtained after each data comparison operation;
the minimum register file in the extremum register circuit includes: a first minimum output port, a second minimum output port, a first minimum input port, a third minimum output port, a comparison level output port, a judgment result input port, a remaining data input port, and a second minimum input port, where the first minimum output port is used to output a first minimum, the second minimum output port is used to output a second minimum, the first minimum input port is used to receive a minimum obtained by a next comparison operation, the third minimum output port is used to output a minimum of a plurality of data, the comparison level output port is used to output the number of layers currently being compared by a second multiplexing comparator, the judgment result input port is used to receive a logic judgment signal output by a judgment ending circuit, and the remaining data input port is used to receive the remaining data stored in a register storage interval during the comparison operation, the second minimum input port is used for receiving a minimum obtained after each data comparison operation.
9. The data selection apparatus according to claim 1, wherein the end judgment circuit includes: and the judging unit is used for judging the number of layers of the current extreme value comparison result and the total number of layers of the multiplexing comparison tree circuit which needs to carry out cyclic comparison operation to obtain the final extreme value.
10. The data selection apparatus according to claim 9, wherein the judging unit includes: the comparison level input port is used for receiving the number of layers corresponding to the currently obtained cyclic comparison result of the extremum register circuit, and the judgment result output port is used for outputting the comparison result of the number of layers corresponding to the current extremum comparison result and the total number of layers of the multiplexing comparison tree circuit which need to be subjected to cyclic comparison operation.
11. A method of data processing, the method comprising:
receiving data to be processed and a function selection mode signal;
performing multi-layer cyclic comparison operation on the data to be processed through a multiplexing comparison tree circuit;
judging whether the condition for finishing the multilayer circulation comparison operation is met or not through a finishing judgment circuit;
if the condition of finishing the multilayer cyclic comparison operation is met, outputting a vector extreme value;
wherein the performing the multi-level cyclic comparison operation on the data to be processed through the multiplexing comparison tree circuit comprises:
and performing multi-layer cyclic comparison operation on the data to be processed according to the function selection mode signal based on the multiplexing comparison tree circuit.
12. The method of claim 11, after receiving the data to be processed, further comprising:
receiving the number N of the data to be processed and the first addresses of a plurality of storage intervals in a register through a data reading unit;
reading the data according to the first addresses of a plurality of storage intervals in the register and the number N of the data, and storing the data into a scalar register array.
13. The method of claim 11, wherein the multiplexing compare tree circuit performs a multi-level circular compare operation on the data to be processed, comprising:
performing first-layer cyclic comparison operation on the data to be processed through a first-stage multiplexing comparator to obtain a first-layer extreme value comparison result;
and carrying out multi-layer cyclic comparison operation on the first-layer extreme value comparison result through a second-stage multiplexing comparator.
14. The method of claim 11, wherein the determining whether the condition for ending the multi-level circular comparison operation is satisfied by the ending determination circuit comprises:
acquiring the number of layers corresponding to the extreme value comparison result obtained by the current comparison operation of the second-stage multiplexing comparator through the ending judgment circuit;
and judging whether the multilayer cyclic comparison operation meets the condition of finishing the multilayer cyclic comparison operation or not according to the number of layers of the current extreme value comparison result.
15. The method of claim 14, wherein after determining whether the condition for ending the multi-level loop comparison operation is satisfied by the ending determination circuit, the method further comprises: if not, the second-stage multiplexing comparator is continuously used for carrying out comparison operation on the extreme value comparison result obtained by the last-stage cyclic comparison operation until the extreme value comparison result of the last-stage cyclic comparison operation is a datum, and the operation is finished to output the vector extreme value.
16. The method of claim 11, wherein outputting a vector extremum if a condition for ending the multi-level circular comparison operation is satisfied comprises: and receiving the logic judgment signal input by the judgment unit through an extreme value register circuit, and outputting an operation result according to the logic judgment signal.
17. A machine learning operation device, comprising one or more data selection devices according to any one of claims 1 to 10, for acquiring input data and control information to be operated from other processing devices, performing a specified machine learning operation, and transmitting the execution result to other processing devices through an I/O interface;
when the machine learning arithmetic device comprises a plurality of data selection devices, the data selection devices can be connected through a specific structure and transmit data;
the data selection devices are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; the data selection devices share the same control system or own respective control systems; the data selection devices share a memory or own respective memories; the interconnection mode of the plurality of data selection devices is any interconnection topology.
18. A combined processing apparatus, characterized in that the combined processing apparatus comprises the machine learning arithmetic apparatus according to claim 17, a universal interconnect interface and other processing apparatus;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
19. The combined processing device according to claim 18, further comprising: and a storage device connected to the machine learning calculation device and the other processing device, respectively, for storing data of the machine learning calculation device and the other processing device.
20. A neural network chip, comprising the machine learning computation apparatus of claim 17 or the combined processing apparatus of claim 19 or the combined processing apparatus of claim 18.
21. An electronic device, characterized in that the electronic device comprises the neural network chip of claim 20.
22. A board, the board comprising: a memory device, a receiving device and a control device and a neural network chip as claimed in claim 20;
wherein the neural network chip is respectively connected with the storage device, the control device and the receiving device;
the storage device is used for storing data;
the receiving device is used for realizing data transmission between the chip and external equipment;
and the control device is used for monitoring the state of the chip.
23. The card of claim 22,
the memory device includes: the chip comprises a plurality of groups of storage intervals, wherein each group of storage intervals is connected with the chip through a bus, and the storage intervals are as follows: DDR SDRAM;
the chip includes: the DDR controller is used for controlling data transmission and data storage of each storage interval;
the receiving device is as follows: a standard PCIE interface.
CN201811450573.9A 2018-11-30 2018-11-30 Data selection device, data processing method, chip and electronic equipment Active CN111258632B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811450573.9A CN111258632B (en) 2018-11-30 2018-11-30 Data selection device, data processing method, chip and electronic equipment
PCT/CN2019/120994 WO2020108486A1 (en) 2018-11-30 2019-11-26 Data processing apparatus and method, chip, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811450573.9A CN111258632B (en) 2018-11-30 2018-11-30 Data selection device, data processing method, chip and electronic equipment

Publications (2)

Publication Number Publication Date
CN111258632A CN111258632A (en) 2020-06-09
CN111258632B true CN111258632B (en) 2022-07-26

Family

ID=70951845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811450573.9A Active CN111258632B (en) 2018-11-30 2018-11-30 Data selection device, data processing method, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN111258632B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102594758A (en) * 2011-01-11 2012-07-18 上海华虹集成电路有限责任公司 Synchronous estimating device and synchronous estimating method for fine timing
CN108027729A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Segmented instruction block

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU960796A1 (en) * 1980-03-19 1982-09-23 Предприятие П/Я Р-6429 Device for determination of extremal values
CN1241749A (en) * 1998-07-09 2000-01-19 北京多思科技工业园股份有限公司 Multifunctional data comparing method and device
CN101882127B (en) * 2010-06-02 2011-11-09 湖南大学 Multi-core processor
US9785434B2 (en) * 2011-09-23 2017-10-10 Qualcomm Incorporated Fast minimum and maximum searching instruction
US10379854B2 (en) * 2016-12-22 2019-08-13 Intel Corporation Processor instructions for determining two minimum and two maximum values
CN108564169B (en) * 2017-04-11 2020-07-14 上海兆芯集成电路有限公司 Hardware processing unit, neural network unit, and computer usable medium
US10338919B2 (en) * 2017-05-08 2019-07-02 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
CN107301031B (en) * 2017-06-15 2020-08-04 西安微电子技术研究所 Normalized floating point data screening circuit

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102594758A (en) * 2011-01-11 2012-07-18 上海华虹集成电路有限责任公司 Synchronous estimating device and synchronous estimating method for fine timing
CN108027729A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Segmented instruction block

Also Published As

Publication number Publication date
CN111258632A (en) 2020-06-09

Similar Documents

Publication Publication Date Title
CN110059797B (en) Computing device and related product
CN109753319B (en) Device for releasing dynamic link library and related product
CN111047022A (en) Computing device and related product
CN110059809B (en) Computing device and related product
CN111260042B (en) Data selector, data processing method, chip and electronic equipment
CN111260043B (en) Data selector, data processing method, chip and electronic equipment
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN111258632B (en) Data selection device, data processing method, chip and electronic equipment
CN111258634B (en) Data selection device, data processing method, chip and electronic equipment
CN209895329U (en) Multiplier and method for generating a digital signal
CN111340229B (en) Data selector, data processing method, chip and electronic equipment
CN111382853B (en) Data processing device, method, chip and electronic equipment
CN110515586B (en) Multiplier, data processing method, chip and electronic equipment
CN111078625B (en) Network-on-chip processing system and network-on-chip data processing method
CN111368987B (en) Neural network computing device and method
CN111384944B (en) Full adder, half adder, data processing method, chip and electronic equipment
CN111260044B (en) Data comparator, data processing method, chip and electronic equipment
CN111258534B (en) Data comparator, data processing method, chip and electronic equipment
CN113031916A (en) Multiplier, data processing method, device and chip
CN112395003A (en) Operation method, device and related product
CN111340202A (en) Operation method, device and related product
CN111368990A (en) Neural network computing device and method
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN209962284U (en) Multiplier, device, chip and electronic equipment
CN110378477B (en) Multiplier, data processing method, chip and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant