CN109324826B - Counting device and counting method - Google Patents

Counting device and counting method Download PDF

Info

Publication number
CN109324826B
CN109324826B CN201811097569.9A CN201811097569A CN109324826B CN 109324826 B CN109324826 B CN 109324826B CN 201811097569 A CN201811097569 A CN 201811097569A CN 109324826 B CN109324826 B CN 109324826B
Authority
CN
China
Prior art keywords
counting
instruction
unit
input data
counted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811097569.9A
Other languages
Chinese (zh)
Other versions
CN109324826A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority claimed from CN201880000923.3A external-priority patent/CN109121435A/en
Publication of CN109324826A publication Critical patent/CN109324826A/en
Priority to US16/697,687 priority Critical patent/US11734002B2/en
Application granted granted Critical
Publication of CN109324826B publication Critical patent/CN109324826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/501Half or full adders, i.e. basic adder cells for one denomination

Abstract

A counting device and a counting method. The counting device comprises a storage unit, a counting unit and a register unit, wherein the storage unit is connected with the counting unit and used for storing input data to be counted and the number of elements meeting given conditions in the input data for storing statistics; the register unit is used for storing the address of the input data to be counted stored in the storage unit; and the counting unit is connected with the register unit and used for acquiring the counting instruction, reading the storage address of the input data to be counted in the register unit according to the counting instruction, acquiring the corresponding input data to be counted in the storage unit, and counting the number of elements meeting given conditions in the input data to be counted to obtain a counting result. In the counting device and the counting method, the calculation efficiency can be improved by writing the algorithm for counting the number of elements meeting the given conditions in the input data into the form of instructions.

Description

Counting device and counting method
The present application is a divisional application of chinese patent with application number 201880000923.3, and the content of the parent patent is incorporated herein by reference.
Technical Field
The disclosure relates to the field of computers, and further relates to a counting device and a counting method in the field of artificial intelligence.
Background
With the advent of the big data era, the neural network algorithm becomes a research hotspot in the field of artificial intelligence in recent years, and is widely applied to aspects such as pattern recognition, image analysis, intelligent robots and the like.
Deep learning is a method based on characterization learning of data in machine learning. An observation (e.g., an image) may be represented using a number of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, a specially shaped region, etc. While it is easier to learn a person from an instance using some particular representation (e.g., face recognition or facial expression recognition).
Several deep learning architectures, such as deep neural networks, convolutional neural networks, deep belief networks, and recurrent neural networks, have been used in the fields of computer vision, speech recognition, natural language processing, audio recognition, and bioinformatics, and have achieved excellent results. In addition, deep learning has become a similar term, or brand remodeling of neural networks.
With the heat of deep learning (neural network), the neural network accelerator also works, and through the design of a special memory and an operation module, the neural network accelerator can obtain an acceleration ratio which is dozens of times or even hundreds of times that of a general processor when the neural network accelerator performs deep learning operation, and has smaller area and lower power consumption.
Disclosure of Invention
The present disclosure provides a counting apparatus comprising a storage unit, a counting unit, and a register unit, wherein,
the storage unit is connected with the counting unit and used for storing input data to be counted and the number of elements meeting given conditions in the input data for storage and statistics;
the register unit is used for storing the address of the input data to be counted stored in the storage unit;
and the counting unit is connected with the register unit and used for acquiring a counting instruction, reading the storage address of the input data to be counted in the register unit according to the counting instruction, acquiring the corresponding input data to be counted in the storage unit, and counting the number of elements meeting given conditions in the input data to be counted to obtain a counting result.
The present disclosure further provides a counting method using the above counting device, which is characterized by comprising:
the storage unit stores input data to be counted and the number of elements meeting given conditions in the input data which are stored and counted;
the register unit stores the address of the input data to be counted stored in the storage unit;
the counting unit obtains a counting instruction, reads a storage address of input data to be counted in the register unit according to the counting instruction, obtains corresponding input data to be counted in the storage unit, and counts the number of elements meeting given conditions in the input data to be counted to obtain a counting result.
In the counting device and the counting method disclosed by the invention, the calculation efficiency can be improved by writing the algorithm for counting the number of elements meeting the given condition in the input data into the form of an instruction.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.
Fig. 1 is a schematic diagram of a frame structure of a counting apparatus according to an embodiment of the disclosure.
Fig. 2 is a schematic structural diagram of a counting unit in the counting device according to the embodiment of the disclosure.
Fig. 3 is a schematic diagram of an adder structure in the counting unit of fig. 2.
FIG. 4 is a block diagram illustrating an instruction set format of a counting instruction in the counting apparatus according to the present disclosure.
Fig. 5 is a flowchart illustrating an implementation process of a counting unit in the counting device according to an embodiment of the disclosure.
Fig. 6 is a schematic structural diagram of a counting device according to an embodiment of the disclosure.
Fig. 7 is a flowchart of an implementation process of the counting device according to the embodiment of the disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure are clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure.
The present disclosure provides a counting apparatus and a counting method supporting a counting instruction, which can improve the calculation efficiency by writing an algorithm for counting the number of elements satisfying a given condition in input data (data to be counted) in the form of an instruction, which will be described in detail below with reference to specific embodiments.
In an exemplary embodiment of the present disclosure, a counting apparatus supporting a count instruction is provided. Fig. 1 is a schematic diagram of a frame structure of a counting apparatus according to an embodiment of the disclosure. As shown in fig. 1, the counting apparatus for supporting a counting instruction of the present disclosure includes: a storage unit, a counting unit, and a register unit. The storage unit is connected with the counting unit and used for storing input data to be counted and the number of elements (counting result) meeting given conditions in the input data for storing statistics, and the storage unit can be a main memory; the counting instruction can be a temporary storage type memory, and further can be a high-speed temporary storage memory, and the counting instruction can flexibly and effectively support data with different widths by temporarily storing the input data to be counted on the high-speed temporary storage memory, so that the execution performance is improved.
In one embodiment, the storage unit is a scratch pad memory, and can support input data with different bit widths and/or input data occupying storage spaces with different sizes, and input data to be counted is temporarily stored on the scratch pad memory, so that the counting process can flexibly and effectively support data with different widths. The counting unit is connected with the register unit and used for acquiring a counting instruction, reading an address of input data in the register unit according to the counting instruction, then acquiring corresponding input data to be counted in the storage unit according to the address of the input data, counting the number of elements meeting given conditions in the input data, obtaining a final counting result and storing the counting result in the storage unit. The register unit is used for storing the address of the input data to be counted stored in the storage unit. In one embodiment, the register unit stores the address of the input data to be counted in the scratch pad memory.
In some embodiments, the data type of the input data to be counted may be 0/1 vectors, and may also be numerical vectors or matrices. When the number of elements satisfying a given condition in the input data is counted, the condition to be satisfied by the counted elements may be the same as that of a given element, for example, the number of elements x included in the statistical vector a may be n, where n is 0,1,2. The condition to be satisfied by the statistical element may also be that a given expression is satisfied, for example, the number of elements greater than a numeric value y in the statistical vector B, where y may be an integer n, n being 0,1,2.. or a floating point f, f being 0.5, 0.6.; for example, the number of elements in the statistical vector C that can be divided by z, where z may be an integer n, 0,1,2.
Fig. 2 is a schematic structural diagram of a counting unit in the counting device according to the embodiment of the disclosure. As shown in fig. 2, the counting unit includes an input/output module, an operation module, and an accumulator module.
The input and output module is connected with the operation module, the data with the set length (the length can be configured according to actual requirements) is taken from the input data to be counted in the storage unit every time, the data is input into the operation module for operation, and after the operation of the operation module is completed, the input and output module continues to take the next section of data with the fixed length until all elements of the input data to be counted are taken; and the input and output module outputs the counting result obtained by the calculation of the accumulator module to the storage unit.
The operation module is connected with the accumulator module, inputs data with set length (a segment of data with fixed length), adds the number of each element of the input data meeting the given condition by using an adder of the operation module, and outputs the obtained result to the accumulator module. The operation module also comprises a judgment submodule for judging whether the input data meets a given condition (the given condition can be that a plurality of elements in the data with the set length are the same as a given element, or the numerical values of the elements in the data with the set length are in a set interval), if yes, outputting 1, if not, outputting 0, and then sending the 0 to the adder for accumulation.
In one embodiment, the structure of the adder may include n layers, wherein: the first layer has l full adders, the second layer has
Figure GDA0002883122080000041
A full adder, … … m layer has
Figure GDA0002883122080000042
A full adder; wherein l, m and n are integers more than 1, m is an integer more than 1 and less than n,
Figure GDA0002883122080000043
indicating that a rounding operation is performed on data x. The specific working process is described below, assuming that the input data type is 0/1 vectors, the number of 1 s in 0/1 vectors to be counted is counted, and assuming that a fixed length of 0/1 vectors is 3l, where l is an integer greater than 1. The first layer of the adder is provided with l full adders; the second layer of the adder has
Figure GDA0002883122080000044
Each full adder has 3 inputs and 2 outputs, so that the first layer obtains 4l/3 outputs in total; according to the method, all adders on all layers have 3 inputs and 2 outputs, and adders on the same layer can be executed in parallel; if the number of the ith bit of data is 1 in the calculation process, the ith bit can be output as the ith result, and the last result is the number of 1 in the 0/1 vector to be counted.
FIG. 3 is a diagram of a specific full adder in which the adder structure includes 7 levels (i.e., n is 7), the first level has 6 full adders, one end of the fixed length 0/1 vector has a length 18 (i.e., l is 6), and the full adders in each level can be in parallel, e.g., level 3 has
Figure GDA0002883122080000051
(i.e. them is 3, l is 6), and when the input data is (0,1,0), (1,0,0), (1,1,0), (0,1,0), (1,0,0), (1,1,0), or 1,1,0), the result is (001000), i.e. 8, as counted by the full adder according to the embodiment of the present disclosure. The adder can increase the parallelism of addition calculation and effectively improve the operation speed of the operation module.
The accumulator module is connected with the input and output module, and the result output by the operation module is accumulated by using the accumulator until no new input exists.
The counting unit is of a multi-flow water level structure, wherein the vector taking operation in the input and output module is in a first flow water level, the operation module is in a second flow water level, and the accumulator module is in a third flow water level. These units are in different pipeline stages and can more efficiently implement the operations required by the counting instruction.
FIG. 4 is a block diagram illustrating an instruction set format of a counting instruction in the counting apparatus according to the present disclosure. As shown in fig. 4, the counting instruction includes an operation code and one or more operation fields, wherein the operation code is used to indicate that the instruction is a counting instruction, the counting unit can perform a counting operation by recognizing the operation code, and the operation fields may include: the address information for indicating the input data to be counted in the counting instruction may further include address information of a judgment condition. For example, when a vector is to be obtained, a vector start address and a vector length can be obtained in a corresponding register according to the register number, and then a vector stored in a corresponding address is obtained in the storage unit according to the vector start address and the vector length. The instruction adopted by the embodiment of the disclosure has a simplified format, so that the instruction set is convenient to use and the supported data length is flexible.
Fig. 5 is a flowchart illustrating an implementation process of a counting unit in the counting device according to an embodiment of the disclosure. As shown in fig. 5, in operation, the counting unit obtains the address of the input data to be counted in the register unit according to the address information in the operation field of the counting instruction, and then obtains the input data to be counted in the storage unit according to the address. The input data to be counted is stored in the high-speed temporary storage, each time the counting unit obtains a section of input data with fixed length from the high-speed temporary storage, the judgment submodule judges whether the elements meet the given conditions or not, then the adder is used for counting the number of the elements meeting the given conditions in the part of input data, the number of the elements meeting the given conditions in each section is accumulated by the accumulator module, the final counting result is obtained, and the counting result is stored in the storage unit.
Fig. 6 is a detailed structural schematic diagram of a counting device according to an embodiment of the disclosure. As shown in fig. 6, the apparatus for supporting counting instructions of the present disclosure may further include: the device comprises an instruction memory, an instruction processing unit, an instruction cache unit and a dependency relationship processing unit.
And the instruction processing unit is used for acquiring the counting instruction from the instruction memory, processing the counting instruction and then providing the counting instruction to the instruction cache unit and the dependency relationship processing unit. Wherein, the instruction processing unit includes: the device comprises an instruction fetching module and a decoding module. The instruction fetching module is connected with the instruction memory and used for obtaining a counting instruction from the instruction memory; the decoding module is connected with the instruction fetching module and used for decoding the acquired counting instruction. In addition, the instruction processing unit may further include an instruction queue memory, which is connected to the decoding module and configured to sequentially store the decoded count instructions and sequentially send the instructions to the instruction cache unit and the dependency processing unit. Considering that the instruction cache unit and the dependency processing unit can hold a limited number of instructions, the instructions in the instruction queue memory must wait until the instruction cache unit and the dependency processing unit are free to continue sequential issue.
And the instruction cache unit can be connected with the instruction processing unit and is used for sequentially storing the counting instructions to be executed. The counting instruction is also cached in the instruction cache unit in the execution process, when one instruction is executed, the instruction execution result (counting result) is transmitted to the instruction cache unit, if the instruction is also the earliest instruction in the uncommitted instructions in the instruction cache unit, the instruction is submitted, and the instruction execution result (counting result) is written back to the cache memory together. In one embodiment, the instruction cache unit may be a reorder cache.
The dependency processing unit can be connected with the instruction queue memory and the counting unit and is used for judging whether a vector (namely a vector to be counted) required by the counting instruction is latest or not before the counting unit acquires the counting instruction, and if so, directly providing the counting instruction to the counting unit; otherwise, the counting instruction is stored in a storage queue of the dependency processing unit, and the counting instruction in the storage queue is provided to the counting unit after the required vector is updated. In particular, when a counting instruction accesses the scratch pad memory, the memory space is waiting for the result writing of the previous instruction, and in order to ensure the correctness of the execution result of the instruction, if the current instruction is detected to have a dependency relationship with the data of the previous instruction, the instruction must wait in the memory queue until the dependency relationship is eliminated. The dependency processing unit enables instructions to be executed out of order and submitted in sequence, effectively reduces pipeline blockage, and can realize accurate exception.
The instruction fetching module is responsible for fetching a next instruction to be executed from the instruction memory and transmitting the instruction to the decoding module; the decoding module is responsible for decoding the instruction and transmitting the decoded instruction to the instruction queue memory; the instruction queue memory is used for caching the decoded instructions, and sending the instructions to the instruction cache unit and the dependency relationship processing unit after the instruction cache unit and the dependency relationship processing unit are idle; during the process that the counting instruction is sent to the dependency processing unit from the instruction queue memory, the counting instruction reads the address of the input data in the storage unit from the register unit; the dependency processing unit is used for processing data dependency between a current instruction and a previous instruction, the counting instruction accesses the storage unit, and other instructions executed before may access the same block of storage space. In order to ensure the correctness of the execution result of the instruction, if the current instruction is detected to have a dependency relationship with the data of the previous instruction, the instruction must wait in the storage queue of the dependency relationship processing unit until the dependency relationship is eliminated. The counting unit obtains a counting instruction from the dependency processing unit, reads the address of input data to be counted from the register unit according to the counting instruction, obtains corresponding input data to be counted from the storage unit, counts the number of elements meeting given conditions in the input data, transmits the counting result to the instruction cache unit, and finally writes the counting result and the counting instruction back to the storage unit.
Fig. 7 is a flowchart of an implementation process of the counting device according to the embodiment of the disclosure. As shown in fig. 7, the process of executing the count instruction includes:
s701, the instruction fetching module fetches the counting instruction from the instruction memory and sends the counting instruction to the decoding module.
S702, the decoding module decodes the counting instruction and sends the counting instruction to the instruction queue memory.
And S703, sending the counting instruction to the instruction cache unit and the dependency relationship processing unit after waiting for the instruction cache unit and the dependency relationship processing unit to be idle in the instruction queue memory.
S704, during the process that the counting instruction is sent to the dependency processing unit from the instruction queue memory, the counting instruction reads the storage address of the input data in the storage unit from the register unit, the dependency processing unit analyzes whether the instruction has a dependency relationship with the previous instruction which is not executed and finished on the data, and the counting instruction needs to wait in the storage queue of the dependency processing unit until the counting instruction and the previous instruction which is not executed and finished no longer have the dependency relationship on the data.
S705: after the dependency does not exist, the count instruction is sent to the count unit. The counting unit acquires input data from the storage unit according to the storage address, and counts the number of elements meeting given conditions in the input data.
S706, after the counting is completed, the counting result is written back to the storage unit through the instruction cache unit, and the instruction cache unit submits the counting instruction to the storage unit.
Up to this point, the present embodiment has been described in detail with reference to the accompanying drawings. From the above description, those skilled in the art should clearly recognize that the counting device and the counting method thereof support counting instructions according to the embodiments of the present disclosure.
In some embodiments, a chip is also disclosed, which comprises the counting device.
In some embodiments, a chip packaging structure is also disclosed, which includes the above chip.
In some embodiments, a board card is further disclosed, which includes the above chip package structure.
In one embodiment, an electronic device is also disclosed, which comprises the above board card.
The electronic device may include, but is not limited to, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device vehicle, a household appliance, and/or a medical device.
The vehicle may comprise an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
In the embodiments provided in the present disclosure, it should be understood that the disclosed related devices and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the described parts or modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of parts or modules may be combined or integrated into a system, or some features may be omitted or not executed.
In this disclosure, the term "and/or" may have been used. As used herein, the term "and/or" means one or the other or both (e.g., a and/or B means a or B or both a and B).
In the description above, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It will be apparent, however, to one skilled in the art, that one or more other embodiments may be practiced without some of these specific details. The specific embodiments described are not intended to limit the disclosure but rather to illustrate it. The scope of the present disclosure is not to be determined by the specific examples provided above but only by the claims below. In other instances, well-known circuits, structures, devices, and operations are shown in block diagram form, rather than in detail, in order not to obscure an understanding of the description. Where considered appropriate, reference numerals or terminal portions of reference numerals have been repeated among the figures to indicate corresponding or analogous elements, optionally having similar characteristics or identical features, unless otherwise specified or evident.
Various operations and methods have been described. Some methods have been described in a relatively basic manner in a flow chart form, but operations may alternatively be added to and/or removed from the methods. Additionally, while the flow diagrams illustrate a particular order of operation according to example embodiments, it is understood that this particular order is exemplary. Alternative embodiments may optionally perform these operations in a different manner, combine certain operations, interleave certain operations, etc. The components, features, and specific optional details of the devices described herein may also optionally be applied to the methods described herein, which may be performed by and/or within such devices in various embodiments.
Each functional part/unit/subunit/module/submodule/component in the present disclosure may be hardware, for example, the hardware may be a circuit, including a digital circuit, an analog circuit, and the like. Physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, and the like. The computing module in the computing device may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, and the like. The memory unit may be any suitable magnetic or magneto-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC, etc.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (16)

1. A counting device comprises a storage unit, a counting unit and a register unit, wherein,
the storage unit is connected with the counting unit and used for storing input data to be counted and the number of elements meeting given conditions in the input data for storage statistics;
the register unit is used for storing the address of the input data to be counted stored in the storage unit;
the counting unit is connected with the register unit and used for acquiring a counting instruction, reading a storage address of the input data to be counted in the register unit according to the counting instruction, acquiring the corresponding input data to be counted in the storage unit, and counting the number of elements meeting given conditions in the input data to be counted to obtain a counting result;
the counting unit comprises an input-output module, an operation module and an accumulator module, wherein,
the input and output module is connected with the operation module and is used for taking the data with the set length in the input data to be counted each time and inputting the data into the operation module;
the operation module comprises an adder for adding the number of each element meeting the given condition in the length setting data and outputting the obtained result to the accumulator module;
the accumulator module is used for accumulating the result obtained by the operation module;
the structure of the adder comprises n layers, wherein: the first layer has l full adders, the second layer has
Figure FDA0002883122070000011
A full adder, the m-th layer has
Figure FDA0002883122070000012
A full adder, wherein l, m and n are integers more than 1, m is an integer more than 1 and less than or equal to n,
Figure FDA0002883122070000013
indicating that a rounding operation is performed on data x.
2. Counting device according to claim 1, wherein the storage unit is a main memory and/or a scratch pad memory.
3. The counting device according to claim 1, wherein the given condition includes: a plurality of elements in the set length data are the same as a given element; or
The numerical values of a plurality of elements in the set length data are in the set interval.
4. The counting device of claim 1, wherein the computing module further comprises:
and the judgment submodule is used for judging whether the set length data meets the given condition, if so, outputting 1, and if not, outputting 0, and then sending the value of 1 output to the adder for accumulation.
5. The counting device of claim 1, wherein the counting unit has a multi-pipeline structure, wherein the vector fetching operation in the input/output module is in a first pipeline stage, the operation module is in a second pipeline stage, and the accumulator module is in a third pipeline stage.
6. The counting apparatus of claim 1, wherein the counting instruction comprises an opcode and one or more operation fields; wherein the content of the first and second substances,
the operation code is used for indicating the instruction to be a counting instruction, and the counting unit can carry out counting operation by identifying the operation code;
the operation domain includes: address information indicating input data to be counted in the count instruction, and/or address information of a judgment condition.
7. The counting device of claim 1, further comprising:
an instruction memory for storing a count instruction;
the instruction processing unit is connected with the instruction memory and used for acquiring the counting instruction from the instruction memory and processing the counting instruction;
the instruction cache unit is connected with the instruction processing unit and used for sequentially storing counting instructions to be executed and in the execution process; the counting unit is also connected with the storage unit and used for submitting the executed counting instruction and counting result to the storage unit;
the dependency relationship processing unit is connected with the instruction processing unit and used for judging whether input data required by the counting instruction is the latest or not before the counting unit acquires the counting instruction, and if so, directly providing the counting instruction to the counting unit; otherwise, storing the counting instruction in a storage queue of the dependency processing unit, and providing the counting instruction in the storage queue to the counting unit after the required input data is updated;
and in the process that the counting instruction is transmitted from the instruction processing unit to the dependency relationship processing unit, the counting instruction reads the storage address of the input data in the storage unit from the register unit.
8. The counting device of claim 7, wherein the instruction processing unit comprises:
the instruction fetching module is connected with the instruction memory and used for obtaining a counting instruction from the instruction memory;
the decoding module is connected with the instruction fetching module and used for decoding the acquired counting instruction;
and the instruction queue is connected with the decoding module and used for sequentially storing the decoded counting instructions and sequentially transmitting the instructions to the instruction cache unit and the dependency relationship processing unit.
9. The counting apparatus of claim 8, wherein the instruction cache unit is a reorder cache unit.
10. The counting device of claim 1, wherein the input data to be counted is of the data type 0/1 vector, a generic numerical vector, or a matrix.
11. A counting method using the counting device according to any one of claims 1 to 10, comprising:
the storage unit stores input data to be counted and the number of elements meeting given conditions in the input data which are stored and counted;
the register unit stores the address of the input data to be counted stored in the storage unit;
the counting unit acquires a counting instruction, reads a storage address of the input data to be counted in the register unit according to the counting instruction, acquires corresponding input data to be counted in the storage unit, and counts the number of elements meeting given conditions in the input data to be counted to obtain a counting result;
the reading, according to the counting instruction, a storage address of the input data to be counted in a register unit, obtaining the corresponding input data to be counted in the storage unit, and counting the number of elements meeting a given condition in the input data to be counted, specifically includes:
the input and output module takes the data with the set length in the input data to be counted in the storage unit each time and inputs the data into the operation module;
the operation module inputs the number of each element meeting the given condition in the set length data to be added, and the obtained result is output to the accumulator module;
the accumulator module accumulates the result obtained by the operation module;
wherein, the operation module inputs the number of each element meeting the given condition in the length setting data and adds up, and the output of the obtained result to the accumulator module comprises:
setting the type of input data as 0/1 vectors, counting the number of 1 in 0/1 vectors to be counted, wherein the length of a segment of 0/1 vector with fixed length is 3l, and l is an integer greater than 1;
inputting a 0/1 vector length with a fixed length of 3l into an adder, wherein the adder has l full adders at a first layer; the second layer of the adder has
Figure FDA0002883122070000041
Each full adder has 3 inputs and 2 outputs, the first layer obtains 4l/3 outputs, and the adders in the same layer are executed in parallel; each layer of full adder has 3 inputs and 2 outputs;
and if the number of the ith data is 1 in the calculation process, outputting the ith data as a final result, wherein the final result is the number of 1 in the 0/1 vector to be counted.
12. Counting method according to claim 11, characterized in that the memory unit is a main memory and/or a scratch pad memory.
13. The counting method of claim 11, further comprising:
and judging whether the set length data meets the given condition or not by the judging submodule, if so, outputting 1, and if not, outputting 0, and then sending the value of 1 output to the adder for accumulation.
14. The counting method of claim 11, wherein the counting instruction comprises an opcode and one or more operation fields; wherein the content of the first and second substances,
the operation code is used for indicating the instruction to be a counting instruction, and the counting unit can carry out counting operation by identifying the operation code;
the operation domain includes: address information indicating input data to be counted in the count instruction, and/or address information of a judgment condition.
15. The counting method of claim 11, further comprising the steps of:
storing, by an instruction memory, a count instruction;
acquiring a counting instruction from an instruction memory through an instruction processing unit, and processing the counting instruction;
sequentially storing counting instructions to be executed and in the execution process through an instruction cache unit; the counting unit is connected with the storage unit and is used for submitting the executed counting instruction and the counting result to the storage unit;
judging whether input data required by a counting instruction is the latest or not before the counting unit acquires the counting instruction through a dependency relationship processing unit, and if so, directly providing the counting instruction to the counting unit; otherwise, storing the counting instruction in a storage queue of the dependency processing unit, and providing the counting instruction in the storage queue to the counting unit after the required input data is updated;
wherein, in the process that the counting instruction is transmitted from the instruction processing unit to the dependency processing unit, the counting instruction reads the storage address of the input data in the storage unit from the register unit.
16. The counting method according to claim 15, wherein the obtaining of the counting instruction from the instruction memory by the instruction processing unit and the processing of the counting instruction specifically include:
the instruction fetching module is used for obtaining a counting instruction from an instruction memory;
the decoding module decodes the acquired counting instruction;
the instruction queue stores the decoded counting instructions in sequence and transmits the instructions to the instruction cache unit and the dependency relationship processing unit in sequence.
CN201811097569.9A 2017-04-19 2018-04-17 Counting device and counting method Active CN109324826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/697,687 US11734002B2 (en) 2017-04-19 2019-11-27 Counting elements in neural network input data

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201710264686.9A CN108733408A (en) 2017-04-21 2017-04-21 Counting device and method of counting
CN2017102646869 2017-04-21
CN201880000923.3A CN109121435A (en) 2017-04-19 2018-04-17 Processing unit and processing method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201880000923.3A Division CN109121435A (en) 2017-04-19 2018-04-17 Processing unit and processing method

Publications (2)

Publication Number Publication Date
CN109324826A CN109324826A (en) 2019-02-12
CN109324826B true CN109324826B (en) 2021-03-26

Family

ID=63933782

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710264686.9A Pending CN108733408A (en) 2017-04-19 2017-04-21 Counting device and method of counting
CN201811097569.9A Active CN109324826B (en) 2017-04-19 2018-04-17 Counting device and counting method

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201710264686.9A Pending CN108733408A (en) 2017-04-19 2017-04-21 Counting device and method of counting

Country Status (1)

Country Link
CN (2) CN108733408A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101571796A (en) * 2008-04-28 2009-11-04 阿尔特拉公司 Configurable hybrid adder circuitry
CN102707931A (en) * 2012-05-09 2012-10-03 刘大可 Digital signal processor based on parallel data channel
CN102866875A (en) * 2012-10-05 2013-01-09 刘杰 Universal multi-operand summator
CN103699360A (en) * 2012-09-27 2014-04-02 北京中科晶上科技有限公司 Vector processor and vector data access and interaction method thereof
CN104699458A (en) * 2015-03-30 2015-06-10 哈尔滨工业大学 Fixed point vector processor and vector data access controlling method thereof
CN106066783A (en) * 2016-06-02 2016-11-02 华为技术有限公司 The neutral net forward direction arithmetic hardware structure quantified based on power weight

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7734674B2 (en) * 2005-08-08 2010-06-08 Freescale Semiconductor, Inc. Fast fourier transform (FFT) architecture in a multi-mode wireless processing system
CN101685388B (en) * 2008-09-28 2013-08-07 北京大学深圳研究生院 Method and device for executing comparison operation
US20140108480A1 (en) * 2011-12-22 2014-04-17 Elmoustapha Ould-Ahmed-Vall Apparatus and method for vector compute and accumulate
US9110657B2 (en) * 2013-01-21 2015-08-18 Tom Yap Flowchart compiler for a compound complex instruction set computer (CCISC) processor architecture
US9507594B2 (en) * 2013-07-02 2016-11-29 Intel Corporation Method and system of compiling program code into predicated instructions for execution on a processor without a program counter
US9513907B2 (en) * 2013-08-06 2016-12-06 Intel Corporation Methods, apparatus, instructions and logic to provide vector population count functionality
US9495155B2 (en) * 2013-08-06 2016-11-15 Intel Corporation Methods, apparatus, instructions and logic to provide population count functionality for genome sequencing and alignment
CN105207794B (en) * 2014-06-05 2019-11-05 南京中兴软件有限责任公司 Statistical counting equipment and its implementation, the system with statistical counting equipment
CN106355246B (en) * 2015-10-08 2019-02-15 上海兆芯集成电路有限公司 Three configuration neural network units
CN105426160B (en) * 2015-11-10 2018-02-23 北京时代民芯科技有限公司 The multiple shooting method of instruction classification based on SPRAC V8 instruction set
CN106447034B (en) * 2016-10-27 2019-07-30 中国科学院计算技术研究所 A kind of neural network processor based on data compression, design method, chip

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101571796A (en) * 2008-04-28 2009-11-04 阿尔特拉公司 Configurable hybrid adder circuitry
CN102707931A (en) * 2012-05-09 2012-10-03 刘大可 Digital signal processor based on parallel data channel
CN103699360A (en) * 2012-09-27 2014-04-02 北京中科晶上科技有限公司 Vector processor and vector data access and interaction method thereof
CN102866875A (en) * 2012-10-05 2013-01-09 刘杰 Universal multi-operand summator
CN104699458A (en) * 2015-03-30 2015-06-10 哈尔滨工业大学 Fixed point vector processor and vector data access controlling method thereof
CN106066783A (en) * 2016-06-02 2016-11-02 华为技术有限公司 The neutral net forward direction arithmetic hardware structure quantified based on power weight

Also Published As

Publication number Publication date
CN108733408A (en) 2018-11-02
CN109324826A (en) 2019-02-12

Similar Documents

Publication Publication Date Title
US11734002B2 (en) Counting elements in neural network input data
CN109032669B (en) Neural network processing device and method for executing vector minimum value instruction
US11710041B2 (en) Feature map and weight selection method and accelerating device
CN109284823B (en) Arithmetic device and related product
CN107315715B (en) Apparatus and method for performing matrix addition/subtraction operation
CN108009126B (en) Calculation method and related product
CN111651205B (en) Apparatus and method for performing vector inner product operation
CN108121688B (en) Calculation method and related product
CN108108190B (en) Calculation method and related product
EP3832500B1 (en) Device and method for performing vector four-fundamental-rule operation
CN107315568B (en) Device for executing vector logic operation
CN112799599B (en) Data storage method, computing core, chip and electronic equipment
CN107943756B (en) Calculation method and related product
CN111651206A (en) Device and method for executing vector outer product operation
CN107305486B (en) Neural network maxout layer computing device
CN111651204B (en) Apparatus and method for performing vector maximum-minimum operation
CN111161705A (en) Voice conversion method and device
CN110738317A (en) FPGA-based deformable convolution network operation method, device and system
CN109324826B (en) Counting device and counting method
CN111857821A (en) Device and method for generating random vectors obeying certain distribution
CN112396072B (en) Image classification acceleration method and device based on ASIC (application specific integrated circuit) and VGG16
CN109543835B (en) Operation method, device and related product
CN109558565B (en) Operation method, device and related product
CN113128688A (en) General AI parallel reasoning acceleration structure and reasoning equipment
CN109558943B (en) Operation method, device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant