CN111738428B - Computing device, method and related product - Google Patents

Computing device, method and related product Download PDF

Info

Publication number
CN111738428B
CN111738428B CN201910227493.5A CN201910227493A CN111738428B CN 111738428 B CN111738428 B CN 111738428B CN 201910227493 A CN201910227493 A CN 201910227493A CN 111738428 B CN111738428 B CN 111738428B
Authority
CN
China
Prior art keywords
control signal
circuit
nonlinear
result
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910227493.5A
Other languages
Chinese (zh)
Other versions
CN111738428A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201910227493.5A priority Critical patent/CN111738428B/en
Publication of CN111738428A publication Critical patent/CN111738428A/en
Application granted granted Critical
Publication of CN111738428B publication Critical patent/CN111738428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a computing device, a computing method and a related product, wherein the related product comprises a neural network chip and a board card, the board card comprises a storage device, an interface device, a control device and the neural network chip, and the neural network chip is respectively connected with the storage device, the control device and the interface device; the memory device is used for storing data; the interface device is used for realizing data transmission between the neural network chip and external equipment; the control device is used for monitoring the state of the neural network chip. The computing device provided by the application has the advantages of low cost and low power consumption.

Description

Computing device, method and related product
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a computing device, a computing method, and related products.
Background
The neural network is the basis of many artificial intelligence applications at present, and is applied in various aspects such as voice recognition, image processing, data analysis, advertisement recommendation systems, automatic driving of automobiles and the like, so that the deep neural network is applied to various aspects of life. However, the computational effort of neural networks, especially deep learning neural networks, is enormous, thereby restricting their faster development and wider application. When a designer considers to apply a hardware accelerator to accelerate the operation of the deep learning neural network, the extremely high energy consumption cost caused by the huge operation amount of the hardware accelerator also restricts the design and the wide application of the accelerator.
Disclosure of Invention
The embodiment of the application provides a computing device, a computing method and related products, wherein when a neural network is used for computing, an operational circuit is closed or an operator in the operational circuit is closed so as to reduce the computing power consumption and improve the computing speed.
In a first aspect, an embodiment of the present application provides a computing device, including a control unit and an arithmetic unit;
the control unit is used for receiving a first control signal and controlling the switching state of part of operation circuits in the operation unit according to the first control signal;
the control unit is further used for acquiring input data, obtaining a second control signal according to the input data, and sending the second control signal to one or more operation circuits in the operation unit, wherein the second control signal is used for controlling the on-off state of an operator of the one or more operation circuits;
the operation unit is used for performing operation according to the received control signal to obtain an operation result.
In a second aspect, an embodiment of the present application provides a machine learning computing device, including one or more computing devices according to the first aspect. The machine learning operation device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation and transmitting an execution result to the other processing devices through an I/O interface;
When the machine learning computing device comprises a plurality of computing devices, the computing devices can be linked through a specific structure and data can be transmitted;
the computing devices are interconnected through the PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of the computing devices share the same control system or have respective control systems; a plurality of computing devices share memory or have respective memories; the manner in which the plurality of computing devices are interconnected is an arbitrary interconnection topology.
In a third aspect, an embodiment of the present application provides a combination processing apparatus, including the machine learning processing apparatus according to the third aspect, a universal interconnect interface, and other processing apparatuses. The machine learning operation device interacts with the other processing devices to jointly complete the operation designated by the user. The combination processing device may further include a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.
In a fourth aspect, an embodiment of the present application provides a neural network chip, where the neural network chip includes the computing device described in the first aspect, the machine learning computing device described in the second aspect, or the combination processing device described in the third aspect.
In a fifth aspect, an embodiment of the present application provides a neural network chip packaging structure, where the neural network chip packaging structure includes the neural network chip described in the fourth aspect;
in a sixth aspect, an embodiment of the present application provides a board, where the board includes the neural network chip packaging structure described in the fifth aspect.
In a seventh aspect, an embodiment of the present application provides an electronic device, where the electronic device includes the neural network chip described in the fourth aspect or the board card described in the sixth aspect.
In an eighth aspect, an embodiment of the present application provides a computing method, which is applied to a computing device, where the computing device includes a control unit and an operation unit;
the control unit receives a first control signal and controls the on-off state of part of operation circuits in the operation unit according to the first control signal;
the control unit acquires input data, obtains a second control signal according to the input data, and sends the second control signal to one or more operation circuits in the operation unit, wherein the second control signal is used for controlling the on-off state of an operator of the one or more operation circuits;
The operation unit performs operation according to the received control signal to obtain an operation result.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1A is a schematic diagram of a computing device according to an embodiment of the present application;
FIG. 1B is a schematic diagram of another computing device according to an embodiment of the present application;
FIG. 1C is a schematic diagram of a conventional addition tree;
FIG. 1D is a schematic diagram of an addition tree according to an embodiment of the present application;
FIG. 1E is a schematic diagram of a prior art Wallace tree structure;
FIG. 1F is a schematic diagram of a Wallace tree according to an embodiment of the present application;
FIG. 1G is a schematic diagram illustrating a computing process of a computing device according to an embodiment of the present application;
FIG. 1H is a schematic diagram illustrating a computing process of another computing device according to an embodiment of the present application;
FIG. 2 is a block diagram of a combined processing apparatus according to an embodiment of the present application;
FIG. 3 is a block diagram of another combined processing apparatus according to an embodiment of the present application;
fig. 3A is a schematic structural diagram of a board according to an embodiment of the present application;
fig. 4 is a flowchart of a calculation method according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Referring to fig. 1A, fig. 1A is a block diagram of a computing device according to an embodiment of the present application, the computing device includes a control unit 10 and an operation unit, wherein the operation unit includes a vector operation circuit 20, a first nonlinear operation circuit 30, an accumulation circuit 40 and a second nonlinear operation circuit 50, and the operations include:
A control unit 10, configured to receive a first control signal, and control a switching state of a part of operation circuits in the operation unit according to the first control signal, where the first control signal is an external control signal;
the control unit 10 is further configured to obtain input data, obtain a second control signal according to the input data, and send the second control signal to one or more operation circuits in the operation unit, where the second control signal is used to control an operator switch state of the one or more operation circuits;
and the operation unit is used for performing operation according to the received control signal to obtain an operation result.
Alternatively, the vector operation circuit 20 may be a multiplication circuit composed of a plurality of multipliers or an addition circuit composed of a plurality of adders;
alternatively, the first nonlinear operation circuit 30 may be x of 2 i Power conversion circuit, x i A result of the vector operation;
alternatively, the accumulating circuit 40 may be an addition tree composed of a plurality of adders or a Wallace compression tree composed of a plurality of adders;
optionally, the second nonlinear operation circuit 50 is a transformation circuit corresponding to an activation function of the neural network, for example, a Sigmoid function, a tanh function, a Relu function, an ELU function, and the like.
It can be seen that, in the embodiment of the present application, when the computing device performs the neural network operation, if a part of the operation circuits are not needed to participate in the operation, the part of the operation circuits are selected to be turned off, so as to reduce the operation power consumption.
In a possible example, the apparatus is implemented by field programmable gate array FPGA or ASIC, so that the first nonlinear operation circuit 30 and/or the second nonlinear operation circuit 50 are removed when designing according to the application requirements of the design to reduce the required area/logic unit and power consumption of the design, fig. 1B is a block diagram of another computing apparatus according to an embodiment of the present application, where the computing apparatus includes a control unit 10 and an operation unit, and the operation unit includes a vector operation circuit 20 and an accumulation circuit 40;
optionally, the control unit 10 is configured to receive a first control signal, determine whether to perform a nonlinear operation according to the first control signal, and implement the nonlinear operation by using a multiplexing operation circuit when performing the nonlinear operation, for example, implement the nonlinear operation by using the multiplexing vector operation circuit 20 and/or the accumulation circuit 40 when performing the nonlinear operation.
In a possible example, the control unit 10 is configured to control the switching states of the first nonlinear operation circuit 30 and the second nonlinear operation circuit 50 according to the first control signal.
Specifically, the operation of the partial neural network does not need to perform the first nonlinear mapping and/or the second nonlinear mapping, and when the operation of the neural network is performed, a first control signal is input, and the first nonlinear operation and/or the second nonlinear operation circuit is closed by the first control signal, so that the operation power consumption of the neural network is reduced.
In a possible example, the control unit 10 obtains the second control signal according to the input data may be: when a certain component of the input data (n-element vector) is judged to meet a judgment condition, the second control signal is obtained, and the judgment condition is a threshold judgment condition and comprises the following steps: less than a given threshold, or greater than a given threshold, or within a given range of values, or outside a given range of values, or after the input data has been functionally mapped, a condition is met, such as the mapped value being equal to the given threshold, greater than or less than the given threshold, or within a given range of values, etc.
For example, when any component of the n-ary vector is smaller than 0.01, 0.02, 0.03 or other values, the control signal corresponding to the component is determined to be in the off state.
Further, according to the tolerable neural network result precision loss, the judgment conditions are dynamically adjusted, such as adjusting the given threshold and the value interval, so as to reduce the operation power consumption as much as possible within a certain precision range.
In a possible example, when the control unit 10 sends the second control signal to one operation unit, that is, the second control signal obtained by the control unit 10 only controls the on-off states of the operators of the vector operation circuit 20, the on-off states of the operators of the first nonlinear operation circuit 30, the accumulation circuit 40 and the second nonlinear operation circuit 50 are determined according to the on-off states of the previous stage operation circuit, that is, the control unit 10 controls the on-off states of the operators of the operation circuits in a pipeline stage manner.
For example, if the ith component operator of the vector operation circuit receives a shutdown signal, the operator is turned off, and a shutdown signal is sent to the first nonlinear unit, if the accumulating circuit adopts an addition tree structure, if the ith component operator of the upper stage operation unit is in a shutdown state, an adder taking the ith component as an input in a first stage adder of the accumulating circuit is also in a shutdown state, if a component operator (i +/-1 component) of the upper stage operation unit corresponding to another input data of the adder is in an on state, the addition result is the other input data, and meanwhile, a control signal of the adder which cannot be turned off is transmitted to a second stage adder; otherwise, the addition result is kept unchanged, and a control signal capable of closing the adder is transmitted to the second-stage adder; the second-stage adder receives the input data and the corresponding control signals, determines the switching states of the adders according to the same rule, transmits the control signals to the third-stage adder, and the like to know the switching states of all adders.
In the above possible example, when the first nonlinear operation circuit 30 is turned off and the second nonlinear operation circuit 50 is turned off: a control unit 10 for sending the second control signal to a vector operation circuit 20; the vector operation circuit 20 is configured to obtain input data, perform vector operation on the input data according to the received second control signal, obtain a vector operation result and a third control signal, send the vector operation result and the third control signal to the accumulation circuit 40, and the third control signal is configured to control an operator switch state of the accumulation circuit 40; and the accumulation circuit 40 is used for accumulating the vector operation result according to the received third control signal to obtain an output result.
In the above possible example, when the first nonlinear operation circuit 30 is turned off but the second nonlinear operation circuit 50 is not turned off, the control unit 10 is configured to send the second control signal to the vector operation circuit 20; the vector operation circuit is configured to obtain input data, perform vector operation on the input data according to the received second control signal, obtain a vector operation result and a third control signal, send the vector operation result and the third control signal to the accumulation circuit 40, and the third control signal is configured to control an operator switch state of the accumulation circuit 40; the accumulating circuit 40 is configured to accumulate the vector operation result according to the received third control signal, obtain an accumulated result and a fourth control signal, and send the accumulated result and the fourth control signal to the second nonlinear operation circuit 50, where the fourth control signal is used to control the on-off state of the operator of the second nonlinear operation circuit 50; the second nonlinear operation circuit 50 is configured to perform a second nonlinear operation on the accumulated result according to the received fourth control signal, so as to obtain an output result.
In the above possible examples, the control unit 10 is configured to send the second control signal to the vector operation circuit 20 when it is determined not to turn off the first nonlinear operation circuit 30 and not to turn off the second nonlinear operation circuit 50; the vector operation circuit 20 is configured to obtain input data, perform vector operation on the input data according to the received second control signal, obtain a vector operation result and a third control signal, and send the vector operation result and the third control signal to the first nonlinear operation circuit 30, where the third control signal is used to control an operator switch state of the first nonlinear operation circuit 30; the first nonlinear operation circuit 30 is configured to perform a first nonlinear operation on the vector operation result according to the received third control signal, obtain a first nonlinear operation result and a fourth control signal, send the first nonlinear operation result and the fourth control signal to the accumulation circuit 40, and the fourth control signal is configured to control an operator switch state of the accumulation circuit 40; the accumulating circuit 40 is configured to accumulate the first nonlinear operation result according to the received fourth control signal, obtain an accumulated result and a fifth control signal, and send the accumulated result and the fifth control signal to the second nonlinear operation circuit 50, where the fifth control signal is used to control an operator switch state of the second nonlinear operation circuit 50; the second nonlinear operation circuit 50 is configured to perform a second nonlinear operation on the accumulated result according to the received fifth control signal, so as to obtain an output result.
In a possible example, when the control unit 10 sends the second control signal to the plurality of arithmetic circuits, that is, the control unit 10 entirely controls the second control signals of the operator switch states of the vector arithmetic circuit 20, the first nonlinear arithmetic circuit 30, the accumulation circuit 40, and the second nonlinear arithmetic circuit 50.
For example, if the input data is two n-element vectors and the i-th component and the (i +/-1) -th component of each vector are zero, when the vector arithmetic circuit is an adder circuit, the control unit sends the second control signal of the shutdown arithmetic unit to the i-th component arithmetic unit and the (i +/-1) -th component arithmetic unit of the vector arithmetic circuit, and sends the second control signal of the shutdown arithmetic unit to the i-th component arithmetic unit and the (i +/-1) -th component arithmetic unit connected to the i-th component arithmetic unit and the (i +/-1) -th component arithmetic unit in the first nonlinear arithmetic circuit, respectively, and sends the second control signal of the shutdown arithmetic unit to the i-th component arithmetic unit or the (i +/-1) -th component arithmetic unit connected to the i-th component arithmetic unit and the (i +/-1) -th component arithmetic unit in the adder circuit 50.
In the above possible examples, the control unit 10 is configured to send the second control signal to the vector operation circuit 20 and the accumulation circuit 40 when the first nonlinear operation circuit 30 is turned off and the second nonlinear operation circuit 50 is turned off, respectively; the vector operation circuit 20 is configured to obtain input data, perform vector operation on the input data according to the received second control signal, obtain a vector operation result, and send the vector operation result to the accumulation circuit 40; and the accumulating circuit 40 is configured to accumulate the vector operation result according to the received second control signal, so as to obtain an output result.
In the above possible examples, when the first nonlinear operation circuit 30 is turned off but the second nonlinear operation circuit 50 is not turned off, the control unit 10 is configured to send the second control signals to the vector operation circuit 20, the accumulation circuit 40, and the second nonlinear operation circuit 50, respectively; the vector operation circuit 20 is configured to obtain input data, perform vector operation on the input data according to the received second control signal, obtain a vector operation result, and send the vector operation result to the accumulation circuit 40; the accumulating circuit 40 is configured to accumulate the vector operation result according to the received second control signal to obtain an accumulated result, and send the accumulated result to the second nonlinear operation circuit 50; the second nonlinear operation circuit 50 is configured to perform a second nonlinear operation on the accumulated result according to the received second control signal, so as to obtain an output result.
In the above possible example, when the first nonlinear operation circuit 30 is not turned off but the second nonlinear operation circuit 50 is turned off, the control unit 10 is configured to send the second control signals to the vector operation circuit 20, the first nonlinear operation circuit 30, and the accumulation circuit 40, respectively; the vector operation circuit 20 is configured to obtain input data, perform vector operation on the input data according to the received second control signal, obtain a vector operation result, and send the vector operation result to the first nonlinear operation circuit 30; a first nonlinear operation circuit 30, configured to perform a first nonlinear operation on the vector operation result according to the received second control signal, obtain a first nonlinear operation result, and send the first nonlinear operation result to an accumulation circuit 40; the accumulating circuit 40 is configured to accumulate the first nonlinear operation result according to the received second control signal, so as to obtain an output result.
In the above possible examples, the control unit 10 is configured to send the second control signals to the vector operation circuit 20, the first nonlinear operation circuit 30, the accumulation circuit 40, and the second nonlinear operation circuit 50, respectively, when the first nonlinear operation circuit 30 is not turned off and the second nonlinear operation circuit 50 is not turned off; the vector operation circuit 20 is configured to obtain input data, perform vector operation on the input data according to the received second control signal, obtain a vector operation result, and send the vector operation result to the first nonlinear operation circuit 30; a first nonlinear operation circuit 30, configured to perform a first nonlinear operation on the vector operation result according to the received second control signal, obtain a first nonlinear operation result, and send the first nonlinear operation result to an accumulation circuit 40; the accumulating circuit 40 is configured to accumulate the first nonlinear operation result according to the received second control signal, obtain an accumulated result, and send the accumulated result to the second nonlinear operation circuit 50; the second nonlinear operation circuit 50 is configured to perform a second nonlinear operation on the accumulated result according to the received second control signal, so as to obtain an output result.
The process of turning off the operator will be specifically described below by taking an accumulation circuit as an example.
FIG. 1C is a diagram showing an exemplary structure of a conventional adder tree for accumulating and summing in the prior art, as shown in FIG. 1C, the adder in the adder tree is always in an on state regardless of the input data, and the power consumption is high; as shown in fig. 1D, in the addition tree provided in the embodiment of the present application, when data is input, a corresponding control signal is transmitted to each adder to control to turn on or off each adder, so as to turn off a part of the adders, so as to reduce power consumption, where an implementation manner of turning on or off the adders may be: if the states of the two input data corresponding to the control signals received by the adder are both in an on state, the adder performs addition operation, and transmits an addition result to the next-stage adder and a signal representing the on state to the next-stage adder; if the states of the two input data corresponding to the control signals received by the adder are both in the closed state, the adder is closed, and the data transmitted to the next-stage adder is kept unchanged and a signal representing the closed state is transmitted to the adder; if one of the two input data received by the adder corresponds to the control signal state and is in the off state, the adder is closed, the data transmitted to the adder at the next stage is the input data corresponding to the control signal state and is in the on state, the signal representing the on state is transmitted to the adder, and the like until the addition tree is accumulated to obtain a final result.
FIG. 1E is a diagram showing an example of a Wallace tree composed of a full adder and a half adder, which can compress 16 bits into 2 bits, wherein the specific process of the accumulation circuit implementing accumulation in this example is: the Wallace tree firstly compresses a plurality of input data into two partial sums, and then the partial sums are accumulated by the adder to obtain an accumulation result. Hua LaiThe structure of the tree comprises n layers, wherein: the first layer has one full adder and the second layer hasThe m-th layer of the full adder … … is provided with +.>A full adder, the last layer (i.e. the nth layer) has a carry look ahead adder; wherein l, m, n are integers greater than 1, m is an integer greater than 1 and less than n, +>Representing a fetch-and-integer operation on data x. The following describes the specific working process, and it is assumed that the input data type is 0/1 vector, the number of 1 in 0/1 vector to be counted is counted, and it is assumed that a fixed length 0/1 vector length is 3l, where l is an integer greater than 1. The first layer of the adder is provided with l full adders, each full adder is provided with 3 1-bit inputs and 2-bit outputs, and then the first layer obtains 2l 2-bit outputs in total; adder second layer has->A plurality of full adders, each full adder having 3 2-bit inputs and 2 3-bit outputs, the first layer resulting in a total of 4 l/3-bit outputs; according to the method, each layer of full adder has 3 inputs and 2 outputs, the number of bits of the outputs is one bit more than that of the inputs, and the calculation of each full adder can be executed in parallel; finally, 2 n-bit outputs are obtained, wherein n represents the number of layers of the adder and is an integer greater than 1; and adding the 2 n-bit outputs by using a carry look ahead adder to obtain 1 n-bit output, namely the number of 1 in the 0/1 vector of the part. The adder can increase the parallelism of addition calculation and effectively improve the operation speed of the operation module.
It can be seen that the Wallace tree shown in FIG. 1E has the full/half adder always on, regardless of the input data, and has high power consumption. As shown in fig. 1F, when data is input, a control signal is input at the same time, for example, when the state of the control signal of the data to be accumulated corresponding to the input of the full adder/half adder of the waling tree is closed, the full adder/half adder at the corresponding position is closed, and a signal representing the closed state is transmitted to the next stage; if the control signal state part of the data to be accumulated corresponding to the input of the full adder/half adder in the Wallace tree is closed, the input of the control signal input representing the closed state is set to 0, namely, all the full adder/half adder at the corresponding position in the Wallace tree are opened, and the signal representing the opened state is transmitted to the next stage.
The following is directed to embodiments for performing a neural network operation y=f (Σ) i W i *x i +b) is an example, and the calculation process of the calculation device of the present application will be described in detail.
Example 1 (vector operation Circuit is a multiplication Circuit comprising a plurality of multipliers)
As shown in fig. 1G, the first nonlinear operation circuit is not included, and the accumulation circuit is composed of an addition tree, and the control unit integrally controls the operators of the vector operation circuit, i.e., sends out a control signal to each multiplier.
The operation process is as follows: the input data x and the weight W are simultaneously input into a control unit and a vector operation circuit, the control unit judges the input data to generate corresponding control signals, and the control signals are sent to a multiplier of the vector operation circuit, for example, the component x of the input data is judged i When the given condition is satisfied, a control signal for closing the multiplier is obtained, the multiplier receives the control signal, closes the multiplier and transmits a control signal representing the closing state to an adder connected with the multiplier, otherwise, the multiplier performs W i *x i An adder for transmitting the multiplication result and a control signal representing the on state to an accumulation circuit connected with the adder; if the control signals corresponding to the two input data received by the adder of the accumulation circuit are in an on state, the adder performs addition operation Calculating, transmitting the addition result to a next-stage adder and transmitting a control signal representing an on state to the next-stage adder; if the control signals corresponding to the two input data received by the adder are in a closed state, the adder is closed, and the data transmitted to the next-stage adder is kept unchanged and the control signal representing the closed state is transmitted to the adder; if one of the two input data received by the adder corresponds to the control signal state and is in the off state, the adder is closed, and the data transmitted to the next-stage adder is the input data of which the control signal state is in the on state and the control signal representing the on state is transmitted to the adder. And so on until the addition tree is accumulated to obtain an accumulation result; and sending the accumulated result to a second nonlinear operation circuit to execute nonlinear operation f, so as to obtain an output result y. As can be seen, the computing device provided by the present example can reduce power consumption in operations by judging whether to turn on or off a multiplier or adder whose input is smaller than a given threshold on the basis of not losing the operation speed of parallel operations.
Example 2 (vector operation Circuit is an adder circuit comprising a plurality of adders)
As shown in fig. 1H, in the present embodiment, the vector operation circuit is composed of n adders; the first nonlinear operation circuit is composed of n x of 2 i Power conversion circuit (x) i Results of the vector operation circuit); the accumulation circuit is composed of an addition tree. The control unit integrally controls the vector operation circuit, and can send out control signals to each adder. The nonlinear conversion circuit corresponding to each adder can receive the control signal besides the data signal and send the control signal to the next stage.
Since the vector operation circuit is an addition circuit, the change of the logarithm of 2 obtained by the operation of the neural network is:
let g=log 2 (y),h i =log 2 (x i ),w′ i =log 2 |w i I, formulaThe conversion is as follows:
after transformation, the specific operation process is as follows: the input data h (n-element vector) and the weight W are simultaneously input into a control unit and a vector operation circuit, the control unit judges the input data h and the weight W to generate corresponding control signals, and the control signals are sent to an adder of the vector operation circuit. When judging the component h of the input data i And a corresponding weight component w' i When the given condition is satisfied, the direction is equal to the component h i Sum component w' i The corresponding adder sends a control signal of the closed state, when the adder receives the control signal, the adder is closed and the control signal representing the closed state is transmitted to the first nonlinear conversion circuit connected with the adder, otherwise, the adder performs addition operation, and the addition result (h i +w′ i ) And a control signal indicating an on state is transmitted to the first nonlinear conversion circuit; if the control signal received by the first nonlinear conversion circuit is in an on state, performing a first nonlinear operation on the addition resultThe operation result is transmitted to an adder of the accumulation circuit and a signal representing the opening state is transmitted to the adder; if the state of the control signal received by the first nonlinear conversion circuit is in a closed state, closing the circuit, and transmitting a control signal which indicates that the adder data is kept unchanged and the closed state to the circuit; if the control signals corresponding to the two input data received by the adder of the accumulation circuit are in an open state, the adder performs addition operation, and an addition result is transmitted to the next-stage adder and a control signal representing the open state is transmitted to the next-stage adder; if the control signals corresponding to the two input data received by the adder are in a closed state, the adder is closed, and the data is transmitted to the next-stage adder to be kept unchanged and the control signals representing the closed state are transmitted to the adder; if the control signals corresponding to the two input data received by the adderWhen one of the number states is in the off state, the adder is turned off, the data transmitted to the next-stage adder is input data with the control signal in the on state, and the control signal representing the on state is transmitted to the next-stage adder. And so on until the addition tree obtains an accumulation result; and transmitting the accumulated result to a second nonlinear operation circuit for nonlinear mapping f to obtain an output result g.
The application also discloses a machine learning operation device which comprises one or more computing devices, wherein the computing devices are used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation, and transmitting an execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one computing device is included, the computing devices may be linked and data transferred by a specific structure, such as interconnection and data transfer via a PCIE bus, to support larger scale machine learning operations. At this time, the same control system may be shared, or independent control systems may be provided; the memory may be shared, or each accelerator may have its own memory. In addition, the interconnection mode can be any interconnection topology.
The machine learning operation device has higher compatibility and can be connected with various types of servers through PCIE interfaces.
The application also discloses a combined processing device which comprises the machine learning operation device, a universal interconnection interface and other processing devices. The machine learning operation device interacts with other processing devices to jointly complete the operation designated by the user. FIG. 2 is a schematic diagram of a combination processing apparatus.
Other processing means may include one or more processor types of general purpose/special purpose processors such as Central Processing Units (CPU), graphics Processing Units (GPU), neural network processors, etc. The number of processors included in the other processing means is not limited. Other processing devices are used as interfaces between the machine learning operation device and external data and control, including data carrying, and complete basic control such as starting, stopping and the like of the machine learning operation device; the other processing device may cooperate with the machine learning computing device to complete the computing task.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning operation device and other processing devices. The machine learning operation device acquires required input data from other processing devices and writes the required input data into a storage device on a chip of the machine learning operation device; the control instruction can be obtained from other processing devices and written into a control cache on a machine learning operation device chip; the data in the memory module of the machine learning arithmetic device may be read and transmitted to the other processing device.
Optionally, as shown in fig. 3, the structure may further include a storage device, where the storage device is connected to the machine learning computing device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which cannot be stored in the machine learning arithmetic device or the other processing device in the internal storage of the data which is required to be calculated.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle, video monitoring equipment and the like, so that the core area of a control part is effectively reduced, the processing speed is improved, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing apparatus is connected to some parts of the device. Some components such as cameras, displays, mice, keyboards, network cards, wifi interfaces.
In some embodiments, a chip is also disclosed, which includes the machine learning computing device or the combination processing device.
In some embodiments, a chip package structure is disclosed, which includes the chip.
In some embodiments, a board card is provided that includes the chip package structure described above. Referring to fig. 3A, fig. 3A provides a board that may include other mating components in addition to the chip 389, including but not limited to: a memory device 390, an interface device 391 and a control device 392;
the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include multiple sets of memory cells 393. Each group of storage units is connected with the chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (English: double Data RateSDRAM, double Rate synchronous dynamic random Access memory).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may include 4 72-bit DDR4 controllers inside, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that the theoretical bandwidth of data transfer can reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells.
In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.
The interface device is electrically connected with the chip in the chip packaging structure. The interface means is used for enabling data transmission between the chip and an external device, such as a server or a computer. For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may be another interface, and the present application is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the calculation result of the chip is still transmitted back to the external device (e.g. a server) by the interface device.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may drive a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the chip.
In some embodiments, an electronic device is provided that includes the above board card.
The electronic device includes a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
Referring to fig. 4, fig. 4 is a flowchart of a computing method according to an embodiment of the present application, where the method is applied to a computing device, and the computing device includes a control unit and an operation unit, and the method includes:
step S401, a control unit receives a first control signal, and controls a switching state of a part of operation circuits in the operation unit according to the first control signal.
Step S402, the control unit obtains input data, obtains a second control signal according to the input data, and sends the second control signal to one or more operation units, wherein the second control signal is used for controlling the on-off states of operators of the one or more operation circuits.
In step S403, the operation unit performs operation according to the received control signal, so as to obtain an operation result.
In a possible example, the operation unit includes a vector operation circuit, a first nonlinear operation circuit, an accumulation circuit, and a second nonlinear operation circuit; the control unit controls the switching states of the first nonlinear operation circuit and the second nonlinear operation circuit according to the first control signal.
In a possible example, when the apparatus is integrated by a field programmable gate array FPGA or an application specific integrated circuit ASIC, the operation unit includes a vector operation circuit and an accumulation circuit, the control unit receives a first control signal, determines whether to perform a nonlinear operation according to the first control signal, and performs the nonlinear operation by multiplexing the operation circuit when performing the nonlinear operation.
In a possible example, the control unit sends the second control signal to the vector operation circuit when the first nonlinear operation circuit and the second nonlinear operation circuit are turned off; the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result and a third control signal, and sends the vector operation result and the third control signal to the accumulation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the accumulation circuit; and the accumulation circuit accumulates vector operation results according to the received third control signal to obtain output results.
In a possible example, the control unit sends the second control signal to the vector operation circuit and the accumulation circuit, respectively, when the first nonlinear operation circuit and the second nonlinear operation circuit are turned off; the vector operation circuit acquires input data, performs vector operation on the input data according to the received second control signal to obtain a vector operation result, and sends the vector operation result to the accumulation circuit; and the accumulation circuit accumulates the vector operation result according to the received second control signal to obtain an output result.
In a possible example, the control unit sends the second control signal to the vector operation circuit when the first nonlinear operation circuit is turned off but the second nonlinear operation circuit is not turned off; the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result and a third control signal, and sends the vector operation result and the third control signal to the accumulation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the accumulation circuit; the accumulating circuit accumulates the vector operation result according to the received third control signal to obtain an accumulated result and a fourth control signal, the accumulated result and the fourth control signal are sent to the second nonlinear operation circuit, and the fourth control signal is used for controlling the on-off state of an operator of the second nonlinear operation circuit; and the second nonlinear operation circuit performs second nonlinear operation on the accumulated result according to the received fourth control signal to obtain an output result.
In a possible example, when the first nonlinear operation circuit is turned off but the second nonlinear operation circuit is not turned off, the control unit sends the second control signal to the vector operation circuit, the accumulation circuit, and the second nonlinear operation circuit, respectively; the vector operation circuit acquires input data, performs vector operation on the input data according to the received second control signal to obtain a vector operation result, and sends the vector operation result to the accumulation circuit; the accumulating circuit accumulates the vector operation result according to the received second control signal to obtain an accumulated result, and sends the accumulated result to the second nonlinear operation circuit; and the second nonlinear operation circuit performs second nonlinear operation on the accumulated result according to the received second control signal to obtain an output result.
In one possible example, the control unit sends the second control signal to the vector operation circuit when it is determined that the first nonlinear operation circuit is not turned off but the second nonlinear operation circuit is turned off;
the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result and a third control signal, and sends the vector operation result and the third control signal to the first nonlinear operation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the first nonlinear operation circuit; the first nonlinear operation circuit performs first nonlinear operation on the vector operation result according to the received third control signal to obtain a first nonlinear operation result and a fourth control signal, the first nonlinear operation result and the fourth control signal are sent to the accumulation circuit, and the fourth control signal is used for controlling the on-off state of an arithmetic unit of the accumulation circuit; and the accumulation circuit accumulates the first nonlinear operation result according to the received fourth control signal to obtain an output result.
In a possible example, when it is determined that the first nonlinear operation circuit is not turned off but the second nonlinear operation circuit is turned off, the control unit sends the second control signal to the vector operation circuit, the first nonlinear operation circuit, and the accumulation circuit, respectively; the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result, and sends the vector operation result to the first nonlinear operation circuit; the first nonlinear operation circuit performs first nonlinear operation on the vector operation result according to the received second control signal to obtain a first nonlinear operation result, and sends the first nonlinear operation result to the accumulation circuit; and the accumulation circuit accumulates the first nonlinear operation result according to the received second control signal to obtain an output result.
In one possible example, the control unit sends the second control signal to the vector operation circuit when it is determined not to turn off the first nonlinear operation circuit and not to turn off the second nonlinear operation circuit; the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result and a third control signal, and sends the vector operation result and the third control signal to the first nonlinear operation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the first nonlinear operation circuit; the first nonlinear operation circuit performs first nonlinear operation on the vector operation result according to the received third control signal to obtain a first nonlinear operation result and a fourth control signal, the first nonlinear operation result and the fourth control signal are sent to the accumulation circuit, and the fourth control signal is used for controlling the on-off state of an arithmetic unit of the accumulation circuit; the accumulation circuit accumulates the first nonlinear operation result according to the received fourth control signal to obtain an accumulation result and a fifth control signal, the accumulation result and the fifth control signal are sent to the second nonlinear operation circuit, and the fifth control signal is used for controlling the on-off state of an operator of the second nonlinear operation circuit; and the second nonlinear operation circuit performs second nonlinear operation on the accumulated result according to the received fifth control signal to obtain an output result.
In a possible example, the control unit sends the second control signal to the vector operation circuit, the first nonlinear operation circuit, the accumulation circuit, and the second nonlinear operation circuit, respectively, when it is determined that the first nonlinear operation circuit is not turned off and the second nonlinear operation circuit is not turned off; the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result, and sends the vector operation result to the first nonlinear operation circuit; the first nonlinear operation circuit performs first nonlinear operation on the vector operation result according to the received second control signal to obtain a first nonlinear operation result, and sends the first nonlinear operation result to the accumulation circuit; the accumulation circuit accumulates the first nonlinear operation result according to the received second control signal to obtain an accumulation result, and sends the accumulation result to the second nonlinear operation circuit; and the second nonlinear operation circuit performs second nonlinear operation on the accumulated result according to the received second control signal to obtain an output result.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software program modules.
The integrated units, if implemented in the form of software program modules, may be stored in a computer-readable memory for sale or use as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (24)

1. A computing device, characterized in that the computing device comprises a control unit and an arithmetic unit;
the control unit is used for receiving a first control signal and controlling the switching state of part of operation circuits in the operation unit according to the first control signal;
The control unit is further used for acquiring input data, obtaining a second control signal according to the input data, and sending the second control signal to one or more operation circuits in the operation unit, wherein the second control signal is used for controlling the on-off state of an operator of the one or more operation circuits;
the operation unit is used for performing operation according to the received control signal to obtain an operation result.
2. The apparatus of claim 1, wherein the arithmetic unit comprises a vector arithmetic circuit and an accumulation circuit when the apparatus is integrated by a field programmable gate array FPGA or an application specific integrated circuit ASIC,
the control unit is used for receiving a first control signal, determining whether to execute nonlinear operation according to the first control signal, and realizing nonlinear operation through the multiplexing operation circuit when the nonlinear operation is executed.
3. The apparatus of claim 1, wherein the arithmetic unit comprises a vector arithmetic circuit, a first nonlinear arithmetic circuit, an accumulation circuit, and a second nonlinear arithmetic circuit;
the control unit is used for controlling the switch states of the first nonlinear operation circuit and the second nonlinear operation circuit according to the first control signal.
4. The apparatus of claim 3, wherein, when the first nonlinear operation circuit and the second nonlinear operation circuit are turned off,
the control unit is used for sending the second control signal to the vector operation circuit;
the vector operation circuit is used for acquiring input data, carrying out vector operation on the input data according to the received second control signal to obtain a vector operation result and a third control signal, and sending the vector operation result and the third control signal to the accumulation circuit, wherein the third control signal is used for controlling the on-off state of an arithmetic unit of the accumulation circuit;
and the accumulation circuit is used for accumulating the vector operation result according to the received third control signal to obtain an output result.
5. The apparatus of claim 3, wherein, when the first nonlinear operation circuit and the second nonlinear operation circuit are turned off,
the control unit is used for respectively sending the second control signals to the vector operation circuit and the accumulation circuit;
the vector operation circuit is used for acquiring input data, carrying out vector operation on the input data according to the received second control signal to obtain a vector operation result, and sending the vector operation result to the accumulation circuit;
And the accumulation circuit is used for accumulating the vector operation result according to the received second control signal to obtain an output result.
6. The apparatus of claim 3, wherein, when the first nonlinear operation circuit is turned off but the second nonlinear operation circuit is not turned off,
the control unit is used for sending the second control signal to the vector operation circuit;
the vector operation circuit is used for acquiring input data, carrying out vector operation on the input data according to the received second control signal to obtain a vector operation result and a third control signal, and sending the vector operation result and the third control signal to the accumulation circuit, wherein the third control signal is used for controlling the on-off state of an arithmetic unit of the accumulation circuit;
the accumulating circuit is used for accumulating the vector operation result according to the received third control signal to obtain an accumulated result and a fourth control signal, the accumulated result and the fourth control signal are sent to the second nonlinear operation circuit, and the fourth control signal is used for controlling the on-off state of an operator of the second nonlinear operation circuit;
And the second nonlinear operation circuit is used for performing second nonlinear operation on the accumulated result according to the received fourth control signal to obtain an output result.
7. The apparatus of claim 3, wherein when the first nonlinear operation circuit is turned off but the second nonlinear operation circuit is not turned off,
the control unit is used for respectively sending the second control signals to the vector operation circuit, the accumulation circuit and the second nonlinear operation circuit;
the vector operation circuit is used for acquiring input data, carrying out vector operation on the input data according to the received second control signal to obtain a vector operation result, and sending the vector operation result to the accumulation circuit;
the accumulation circuit is used for accumulating the vector operation result according to the received second control signal to obtain an accumulation result, and sending the accumulation result to the second nonlinear operation circuit;
and the second nonlinear operation circuit is used for carrying out second nonlinear operation on the accumulated result according to the received second control signal to obtain an output result.
8. The apparatus of claim 3, wherein when the first nonlinear operation circuit is not turned off but the second nonlinear operation circuit is turned off,
The control unit is used for sending the second control signal to the vector operation circuit;
the vector operation circuit is used for acquiring input data, carrying out vector operation on the input data according to the received second control signal to obtain a vector operation result and a third control signal, and sending the vector operation result and the third control signal to the first nonlinear operation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the first nonlinear operation circuit;
the first nonlinear operation circuit is used for performing first nonlinear operation on the vector operation result according to the received third control signal to obtain a first nonlinear operation result and a fourth control signal, the first nonlinear operation result and the fourth control signal are sent to the accumulation circuit, and the fourth control signal is used for controlling the on-off state of an arithmetic unit of the accumulation circuit;
and the accumulation circuit is used for accumulating the first nonlinear operation result according to the received fourth control signal to obtain an output result.
9. The apparatus of claim 3, wherein when the first nonlinear operation circuit is not turned off but the second nonlinear operation circuit is turned off,
The control unit is used for respectively sending the second control signals to the vector operation circuit, the first nonlinear operation circuit and the accumulation circuit;
the vector operation circuit is used for acquiring input data, carrying out vector operation on the input data according to the received second control signal to obtain a vector operation result, and sending the vector operation result to the first nonlinear operation circuit;
the first nonlinear operation circuit is used for carrying out first nonlinear operation on the vector operation result according to the received second control signal to obtain a first nonlinear operation result, and sending the first nonlinear operation result to the accumulation circuit;
and the accumulation circuit is used for accumulating the first nonlinear operation result according to the received second control signal to obtain an output result.
10. The apparatus of claim 3, wherein when the first nonlinear operation circuit is not turned off and the second nonlinear operation circuit is not turned off,
the control unit is used for sending the second control signal to the vector operation circuit;
the vector operation circuit is used for acquiring input data, carrying out vector operation on the input data according to the received second control signal to obtain a vector operation result and a third control signal, and sending the vector operation result and the third control signal to the first nonlinear operation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the first nonlinear operation circuit;
The first nonlinear operation circuit is used for performing first nonlinear operation on the vector operation result according to the received third control signal to obtain a first nonlinear operation result and a fourth control signal, the first nonlinear operation result and the fourth control signal are sent to the accumulation circuit, and the fourth control signal is used for controlling the on-off state of an arithmetic unit of the accumulation circuit;
the accumulation circuit is used for accumulating the first nonlinear operation result according to the received fourth control signal to obtain an accumulation result and a fifth control signal, the accumulation result and the fifth control signal are sent to the second nonlinear operation circuit, and the fifth control signal is used for controlling the on-off state of an arithmetic unit of the second nonlinear operation circuit;
and the second nonlinear operation circuit is used for performing second nonlinear operation on the accumulated result according to the received fifth control signal to obtain an output result.
11. The apparatus of claim 3, wherein when the first nonlinear operation circuit is not turned off and the second nonlinear operation circuit is not turned off,
the control unit is used for respectively sending the second control signals to the vector operation circuit, the first nonlinear operation circuit, the accumulation circuit and the second nonlinear operation circuit;
The vector operation circuit is used for acquiring input data, carrying out vector operation on the input data according to the received second control signal to obtain a vector operation result, and sending the vector operation result to the first nonlinear operation circuit;
the first nonlinear operation circuit is used for carrying out first nonlinear operation on the vector operation result according to the received second control signal to obtain a first nonlinear operation result, and sending the first nonlinear operation result to the accumulation circuit;
the accumulation circuit is used for accumulating the first nonlinear operation result according to the received second control signal to obtain an accumulation result, and sending the accumulation result to the second nonlinear operation circuit;
and the second nonlinear operation circuit is used for carrying out second nonlinear operation on the accumulated result according to the received second control signal to obtain an output result.
12. A neural network chip, characterized in that it comprises a computing device according to any of claims 1-10.
13. A board, characterized in that, the board includes: a memory device, an interface device and a control device, and a neural network chip as claimed in claim 12;
The neural network chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the chip and external equipment;
the control device is used for monitoring the state of the chip.
14. A computing method, characterized in that the method is applied to a computing device comprising a control unit and an arithmetic unit;
the control unit receives a first control signal and controls the on-off state of part of operation circuits in the operation unit according to the first control signal;
the control unit acquires input data, obtains a second control signal according to the input data, and sends the second control signal to one or more operation units, wherein the second control signal is used for controlling the on-off state of an operator of the one or more operation circuits;
the operation unit performs operation according to the received control signal to obtain an operation result.
15. The method of claim 14, wherein the arithmetic unit comprises a vector arithmetic circuit and an accumulation circuit when the device is integrated by a field programmable gate array FPGA or an application specific integrated circuit ASIC,
The control unit receives a first control signal, determines whether to execute nonlinear operation according to the first control signal, and realizes nonlinear operation through a multiplexing operation circuit when executing nonlinear operation.
16. The method of claim 14, wherein the arithmetic unit comprises a vector arithmetic circuit, a first nonlinear arithmetic circuit, an accumulation circuit, and a second nonlinear arithmetic circuit;
the control unit controls the switching states of the first nonlinear operation circuit and the second nonlinear operation circuit according to the first control signal.
17. The method of claim 16, wherein, when the first nonlinear operation circuit and the second nonlinear operation circuit are turned off,
the control unit sends the second control signal to the vector operation circuit;
the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result and a third control signal, and sends the vector operation result and the third control signal to the accumulation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the accumulation circuit;
And the accumulation circuit accumulates vector operation results according to the received third control signal to obtain output results.
18. The method of claim 16, wherein, when the first nonlinear operation circuit and the second nonlinear operation circuit are turned off,
the control unit sends the second control signals to the vector operation circuit and the accumulation circuit respectively;
the vector operation circuit acquires input data, performs vector operation on the input data according to the received second control signal to obtain a vector operation result, and sends the vector operation result to the accumulation circuit;
and the accumulation circuit accumulates the vector operation result according to the received second control signal to obtain an output result.
19. The method of claim 16, wherein, when the first nonlinear operation circuit is turned off but the second nonlinear operation circuit is not turned off,
the control unit sends the second control signal to the vector operation circuit;
the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result and a third control signal, and sends the vector operation result and the third control signal to the accumulation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the accumulation circuit;
The accumulating circuit accumulates the vector operation result according to the received third control signal to obtain an accumulated result and a fourth control signal, the accumulated result and the fourth control signal are sent to the second nonlinear operation circuit, and the fourth control signal is used for controlling the on-off state of an operator of the second nonlinear operation circuit;
and the second nonlinear operation circuit performs second nonlinear operation on the accumulated result according to the received fourth control signal to obtain an output result.
20. The method of claim 16, wherein, when the first nonlinear operation circuit is turned off but the second nonlinear operation circuit is not turned off,
the control unit sends the second control signals to the vector operation circuit, the accumulation circuit and the second nonlinear operation circuit respectively;
the vector operation circuit acquires input data, performs vector operation on the input data according to the received second control signal to obtain a vector operation result, and sends the vector operation result to the accumulation circuit;
the accumulating circuit accumulates the vector operation result according to the received second control signal to obtain an accumulated result, and sends the accumulated result to the second nonlinear operation circuit;
And the second nonlinear operation circuit performs second nonlinear operation on the accumulated result according to the received second control signal to obtain an output result.
21. The method of claim 16, wherein upon determining not to turn off the first nonlinear operation circuit but to turn off the second nonlinear operation circuit,
the control unit sends the second control signal to the vector operation circuit;
the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result and a third control signal, and sends the vector operation result and the third control signal to the first nonlinear operation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the first nonlinear operation circuit;
the first nonlinear operation circuit performs first nonlinear operation on the vector operation result according to the received third control signal to obtain a first nonlinear operation result and a fourth control signal, the first nonlinear operation result and the fourth control signal are sent to the accumulation circuit, and the fourth control signal is used for controlling the on-off state of an arithmetic unit of the accumulation circuit;
And the accumulation circuit accumulates the first nonlinear operation result according to the received fourth control signal to obtain an output result.
22. The method of claim 16, wherein upon determining not to turn off the first nonlinear operation circuit but to turn off the second nonlinear operation circuit,
the control unit sends the second control signals to the vector operation circuit, the first nonlinear operation circuit and the accumulation circuit respectively;
the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result, and sends the vector operation result to the first nonlinear operation circuit;
the first nonlinear operation circuit performs first nonlinear operation on the vector operation result according to the received second control signal to obtain a first nonlinear operation result, and sends the first nonlinear operation result to the accumulation circuit;
and the accumulation circuit accumulates the first nonlinear operation result according to the received second control signal to obtain an output result.
23. The method of claim 16, wherein upon determining not to turn off the first nonlinear operation circuit and not to turn off the second nonlinear operation circuit,
The control unit sends the second control signal to the vector operation circuit;
the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result and a third control signal, and sends the vector operation result and the third control signal to the first nonlinear operation circuit, wherein the third control signal is used for controlling the on-off state of an operator of the first nonlinear operation circuit;
the first nonlinear operation circuit performs first nonlinear operation on the vector operation result according to the received third control signal to obtain a first nonlinear operation result and a fourth control signal, the first nonlinear operation result and the fourth control signal are sent to the accumulation circuit, and the fourth control signal is used for controlling the on-off state of an arithmetic unit of the accumulation circuit;
the accumulation circuit accumulates the first nonlinear operation result according to the received fourth control signal to obtain an accumulation result and a fifth control signal, the accumulation result and the fifth control signal are sent to the second nonlinear operation circuit, and the fifth control signal is used for controlling the on-off state of an operator of the second nonlinear operation circuit;
And the second nonlinear operation circuit performs second nonlinear operation on the accumulated result according to the received fifth control signal to obtain an output result.
24. The method of claim 16, wherein upon determining not to turn off the first nonlinear operation circuit and not to turn off the second nonlinear operation circuit,
the control unit sends the second control signal to the vector operation circuit, the first nonlinear operation circuit, the accumulation circuit and the second nonlinear operation circuit respectively;
the vector operation circuit acquires input data, performs vector operation on the input data according to a received second control signal to obtain a vector operation result, and sends the vector operation result to the first nonlinear operation circuit;
the first nonlinear operation circuit performs first nonlinear operation on the vector operation result according to the received second control signal to obtain a first nonlinear operation result, and sends the first nonlinear operation result to the accumulation circuit;
the accumulation circuit accumulates the first nonlinear operation result according to the received second control signal to obtain an accumulation result, and sends the accumulation result to the second nonlinear operation circuit;
And the second nonlinear operation circuit performs second nonlinear operation on the accumulated result according to the received second control signal to obtain an output result.
CN201910227493.5A 2019-03-25 2019-03-25 Computing device, method and related product Active CN111738428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910227493.5A CN111738428B (en) 2019-03-25 2019-03-25 Computing device, method and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910227493.5A CN111738428B (en) 2019-03-25 2019-03-25 Computing device, method and related product

Publications (2)

Publication Number Publication Date
CN111738428A CN111738428A (en) 2020-10-02
CN111738428B true CN111738428B (en) 2023-08-25

Family

ID=72645760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910227493.5A Active CN111738428B (en) 2019-03-25 2019-03-25 Computing device, method and related product

Country Status (1)

Country Link
CN (1) CN111738428B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1227366A (en) * 1998-02-19 1999-09-01 朗迅科技公司 Low power multiplier for CPU and DSP
JP2008034953A (en) * 2006-07-26 2008-02-14 Kobe Univ Image processing processor

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287475A (en) * 2003-01-27 2004-10-14 Fujitsu Ten Ltd Electronic controller and electronic driving device
JP2005269392A (en) * 2004-03-19 2005-09-29 Nec Electronics Corp Receiving device, receiving method, and communication system and device
CN110378468B (en) * 2019-07-08 2020-11-20 浙江大学 Neural network accelerator based on structured pruning and low bit quantization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1227366A (en) * 1998-02-19 1999-09-01 朗迅科技公司 Low power multiplier for CPU and DSP
JP2008034953A (en) * 2006-07-26 2008-02-14 Kobe Univ Image processing processor

Also Published As

Publication number Publication date
CN111738428A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN109543832B (en) Computing device and board card
CN109522052B (en) Computing device and board card
CN111047022B (en) Computing device and related product
CN109670581B (en) Computing device and board card
CN110059797B (en) Computing device and related product
CN110909870B (en) Training device and method
CN111930681B (en) Computing device and related product
CN109711540B (en) Computing device and board card
CN111488976B (en) Neural network computing device, neural network computing method and related products
CN111488963B (en) Neural network computing device and method
WO2021185262A1 (en) Computing apparatus and method, board card, and computer readable storage medium
WO2021082725A1 (en) Winograd convolution operation method and related product
CN111382847B (en) Data processing device and related product
CN113031912B (en) Multiplier, data processing method, device and chip
CN111738428B (en) Computing device, method and related product
CN109740730B (en) Operation method, device and related product
CN112801276B (en) Data processing method, processor and electronic equipment
CN111382856B (en) Data processing device, method, chip and electronic equipment
CN111047023B (en) Computing device and related product
CN111381806A (en) Data comparator, data processing method, chip and electronic equipment
CN112765539B (en) Computing device, computing method and related product
CN111384944B (en) Full adder, half adder, data processing method, chip and electronic equipment
CN111260044B (en) Data comparator, data processing method, chip and electronic equipment
CN111062469B (en) Computing device and related product
WO2021185261A1 (en) Computing apparatus, method, board card and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant