CN117251134A - Data processing methods, devices, storage media and electronic equipment for neural networks - Google Patents

Data processing methods, devices, storage media and electronic equipment for neural networks Download PDF

Info

Publication number
CN117251134A
CN117251134A CN202210642072.0A CN202210642072A CN117251134A CN 117251134 A CN117251134 A CN 117251134A CN 202210642072 A CN202210642072 A CN 202210642072A CN 117251134 A CN117251134 A CN 117251134A
Authority
CN
China
Prior art keywords
data
processed
order
accumulated
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210642072.0A
Other languages
Chinese (zh)
Inventor
孙炜
祝叶华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202210642072.0A priority Critical patent/CN117251134A/en
Publication of CN117251134A publication Critical patent/CN117251134A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/4836Computations with rational numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/485Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • G06F7/4876Multiplying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Nonlinear Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Neurology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The disclosure provides a data processing method and device of a neural network, a storage medium and electronic equipment, and relates to the technical field of computers. The method comprises the following steps: acquiring a plurality of groups of data to be processed, which are input to a current operation unit in a neural network, wherein the data to be processed comprise an order and a mantissa; extracting common order factors of the multiple groups of data to be processed, and multiplying mantissas in each group of data to be processed by the residual orders after the common order factors are extracted to obtain data to be accumulated corresponding to each group of data to be processed; and accumulating the data to be accumulated corresponding to each group of data to be processed, and multiplying the accumulated result by the common order factor to obtain the output data of the current operation unit. The method and the device are beneficial to improving the data processing efficiency of the neural network and reducing the hardware power consumption.

Description

神经网络的数据处理方法、装置、存储介质与电子设备Data processing methods, devices, storage media and electronic equipment for neural networks

技术领域Technical field

本公开涉及计算机技术领域,尤其涉及一种神经网络的数据处理方法、神经网络的数据处理装置、计算机可读存储介质与电子设备。The present disclosure relates to the field of computer technology, and in particular, to a neural network data processing method, a neural network data processing device, a computer-readable storage medium, and an electronic device.

背景技术Background technique

在基于神经网络的算法中,数据可以分为浮点数据和定点数据,核心的计算单元是乘累加操作。然而,相关技术中,需要基于不同的神经网络对浮点数据或定点数据进行处理,降低了神经网络的数据处理方法通用性;在进行累加操作时,直接对大位宽的待累加数据进行累加,使得神经网络的数据处理方法运算速度慢,增加了硬件功耗,从而降低了神经网络的数据处理效率。In algorithms based on neural networks, data can be divided into floating-point data and fixed-point data, and the core computing unit is the multiply-accumulate operation. However, in related technologies, floating-point data or fixed-point data need to be processed based on different neural networks, which reduces the versatility of the data processing method of the neural network; when performing an accumulation operation, the data to be accumulated with a large bit width is directly accumulated. , making the data processing method of the neural network slow in operation, increasing the hardware power consumption, thereby reducing the data processing efficiency of the neural network.

需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background section is only used to enhance understanding of the background of the present disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art.

发明内容Contents of the invention

本公开提供了一种神经网络的数据处理方法、神经网络的数据处理装置、计算机可读存储介质与电子设备,进而至少在一定程度上改善神经网络的数据处理效率低的问题。The present disclosure provides a data processing method of a neural network, a data processing device of a neural network, a computer-readable storage medium and an electronic device, thereby improving the problem of low data processing efficiency of the neural network at least to a certain extent.

本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。Additional features and advantages of the disclosure will be apparent from the following detailed description, or, in part, may be learned by practice of the disclosure.

根据本公开的第一方面,提供一种神经网络的数据处理方法,包括:获取输入至神经网络中的当前运算单元的多组待处理数据,所述待处理数据包括阶数和尾数;提取所述多组待处理数据的公共阶数因子,并对每一组待处理数据中的尾数与提取所述公共阶数因子后的剩余阶数相乘,得到每一组待处理数据对应的待累加数据;对各组待处理数据对应的待累加数据进行累加,并将累加结果与所述公共阶数因子相乘,得到所述当前运算单元的输出数据。According to a first aspect of the present disclosure, a data processing method for a neural network is provided, including: acquiring multiple sets of data to be processed input to the current operation unit in the neural network, where the data to be processed includes an order and a mantissa; extracting the Describe the common order factors of multiple groups of data to be processed, and multiply the mantissa in each group of data to be processed by the remaining order after extracting the common order factor to obtain the accumulated order corresponding to each group of data to be processed. Data; accumulate the data to be accumulated corresponding to each group of data to be processed, and multiply the accumulation result by the common order factor to obtain the output data of the current operation unit.

根据本公开的第二方面,提供一种神经网络的数据处理装置,包括:待处理数据获取模块,被配置为获取输入至神经网络中的当前运算单元的多组待处理数据,所述待处理数据包括阶数和尾数;公共阶数因子提取模块,被配置为提取所述多组待处理数据的公共阶数因子,并对每一组待处理数据中的尾数与提取所述公共阶数因子后的剩余阶数相乘,得到每一组待处理数据对应的待累加数据;输出数据获取模块,被配置为对各组待处理数据对应的待累加数据进行累加,并将累加结果与所述公共阶数因子相乘,得到所述当前运算单元的输出数据。According to a second aspect of the present disclosure, a data processing device for a neural network is provided, including: a data acquisition module to be processed, configured to acquire multiple sets of data to be processed input to a current computing unit in the neural network, the data to be processed being The data includes an order and a mantissa; a common order factor extraction module is configured to extract the common order factor of the multiple groups of data to be processed, and extract the common order factor from the mantissa in each group of data to be processed. Multiply the remaining orders to obtain the data to be accumulated corresponding to each group of data to be processed; the output data acquisition module is configured to accumulate the data to be accumulated corresponding to each group of data to be processed, and compare the accumulated results with the above The common order factors are multiplied together to obtain the output data of the current operation unit.

根据本公开的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面的神经网络的数据处理方法及其可能的实现方式。According to a third aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the data processing method of the neural network of the above-mentioned first aspect and its possible implementation are implemented. Way.

根据本公开的第四方面,提供一种电子设备,包括:处理器;存储器,用于存储所述处理器的可执行指令。其中,所述处理器配置为经由执行所述可执行指令,来执行上述第一方面的神经网络的数据处理方法及其可能的实现方式。According to a fourth aspect of the present disclosure, an electronic device is provided, including: a processor; and a memory for storing executable instructions of the processor. Wherein, the processor is configured to execute the data processing method of the neural network of the first aspect and its possible implementation by executing the executable instructions.

本公开的技术方案具有以下有益效果:The technical solution of the present disclosure has the following beneficial effects:

在神经网络的数据处理过程中,获取输入的多组待处理数据,提取待处理数据的公共阶数因子,将每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘,以得到待累加数据,再对待累加数据进行累加,将累加结果与公共阶数因子相乘,以得到当前运算单元的输出数据。一方面,基于已有的定点乘法器和加法器,对每一组待处理数据提取公共阶数因子,使得本公开不但可以处理浮点运算,还可以处理定点运算,提高了神经网络的数据处理方法的通用性;另一方面,将每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘,以得到待累加数据,再对待累加数据进行累加,减小了待累加数据和累加运算的位宽,提高了运算速度,进一步提高了神经网络的数据处理效率的同时降低了硬件功耗。In the data processing process of the neural network, multiple sets of input data to be processed are obtained, the common order factors of the data to be processed are extracted, and the mantissa in each set of data to be processed is compared with the remaining order after extracting the common order factor. Multiply to obtain the data to be accumulated, then accumulate the data to be accumulated, and multiply the accumulation result by the common order factor to obtain the output data of the current operation unit. On the one hand, based on existing fixed-point multipliers and adders, common order factors are extracted for each set of data to be processed, so that the present disclosure can not only handle floating-point operations, but also fixed-point operations, improving the data processing of neural networks. The method is versatile; on the other hand, the mantissa in each set of data to be processed is multiplied by the remaining order after extracting the common order factor to obtain the data to be accumulated, and then the data to be accumulated is accumulated, which reduces the time required The bit width of the accumulated data and the accumulated operation improves the operation speed, further improves the data processing efficiency of the neural network and reduces the hardware power consumption.

应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and do not limit the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. Obviously, the drawings in the following description are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

图1示出本示例性实施方式运行环境的系统架构;Figure 1 shows the system architecture of the operating environment of this exemplary embodiment;

图2示出本示例性实施方式中一种神经网络的数据处理方法的流程图;Figure 2 shows a flow chart of a data processing method of a neural network in this exemplary embodiment;

图3示出本示例性实施方式中三组待处理数据进行乘累加运算的示意图;Figure 3 shows a schematic diagram of three groups of data to be processed performing multiplication and accumulation operations in this exemplary embodiment;

图4示出本示例性实施方式中获取公共阶数因子的阶码的过程示意图;Figure 4 shows a schematic diagram of the process of obtaining the order code of the common order factor in this exemplary embodiment;

图5示出本示例性实施方式中提取公共阶数因子后对待累加数据进行累加的过程示意图;Figure 5 shows a schematic diagram of the process of accumulating the data to be accumulated after extracting the common order factors in this exemplary embodiment;

图6示出本示例性实施方式中一种神经网络的数据处理方法中累加操作的过程流程图;Figure 6 shows a process flow chart of the accumulation operation in a data processing method of a neural network in this exemplary embodiment;

图7示出本示例性实现方式中另一种神经网络的数据处理方法的流程图;Figure 7 shows a flow chart of another data processing method of a neural network in this exemplary implementation;

图8示出本示例性实现方式中待处理数据为图像数据时的一种卷积神经网络的数据处理过程的示意图;Figure 8 shows a schematic diagram of the data processing process of a convolutional neural network when the data to be processed is image data in this exemplary implementation;

图9示出本示例性实现方式中待处理数据为图像数据时的一种卷积神经网络的数据处理中累加运算的运算过程示意图;Figure 9 shows a schematic diagram of the operation process of the accumulation operation in the data processing of a convolutional neural network when the data to be processed is image data in this exemplary implementation;

图10示出本示例性实施方式中一种神经网络的数据处理装置的结构示意图;Figure 10 shows a schematic structural diagram of a data processing device of a neural network in this exemplary embodiment;

图11示出本示例性实施方式中一种电子设备的示意图。FIG. 11 shows a schematic diagram of an electronic device in this exemplary embodiment.

具体实施方式Detailed ways

现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art. The described features, structures or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details being omitted, or other methods, components, devices, steps, etc. may be adopted. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the disclosure.

此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings represent the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software form, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.

在计算机系统中,数据可以表示为定点和浮点两种方式;如果每个数据定点表示,则数据的小数点位置固定不变且数据的取值范围有限,可以用于表示一个小数或整数;如可以用于定点方式表示货币数据,例如,88.00或00.88可以用于表示具有四位精度,小数点后有两位的货币值。In computer systems, data can be expressed in two ways: fixed-point and floating-point; if each data is expressed in fixed-point, the decimal point position of the data is fixed and the value range of the data is limited, and it can be used to represent a decimal or an integer; such as Can be used to represent monetary data in fixed-point mode. For example, 88.00 or 00.88 can be used to represent a monetary value with four digits of precision and two decimal places.

由于固定的小数点位置决定了固定位数的整数部分和小数部分,使得用定点方式表示数据导致数据的形式过于僵硬,不利于同时表达特别大的数或特别小的数,如果用定点方式表示两个大小相差很大的数据,需要很长的机器字长,导致数据存储单元的利用率较低。因此,在实际应用中,大多数的现代计算机都采用浮点方式表达数据,浮点方式利用科学计数法表示实数,用浮点方式表达的数据可以包括阶数和尾数,基数和符号位,如十进制的定点数据123.45可以用浮点方式表示为(-1)°×1.2345×102,其中(-1)°可以是符号位,1.2345可以是尾数,10可以是基数,2可以是阶码,102可以是阶数,浮点数可以通过指数达到浮动小数点的效果,进而控制数据的数值范围。浮点数据可以表示的数据范围更大,且计算精度更高,但同时硬件实现的代价较大。Since the fixed decimal point position determines the integer part and the decimal part of the fixed number of digits, using fixed-point method to represent data results in the form of data being too rigid, which is not conducive to expressing particularly large numbers or small numbers at the same time. If fixed-point method is used to represent two Data with very different sizes requires a long machine word length, resulting in low utilization of the data storage unit. Therefore, in practical applications, most modern computers use floating point to express data. Floating point uses scientific notation to represent real numbers. Data expressed in floating point can include order and mantissa, base and sign bits, such as The decimal fixed-point data 123.45 can be expressed in floating point mode as (-1)°×1.2345×10 2 , where (-1)° can be the sign bit, 1.2345 can be the mantissa, 10 can be the base, and 2 can be the exponent code. 10 2 can be an order, and floating point numbers can achieve the effect of floating decimal points through exponents, thereby controlling the numerical range of the data. Floating point data can represent a larger range of data and has higher calculation accuracy, but at the same time, the cost of hardware implementation is high.

相关技术中,应用于神经网络的人工智能计算引擎,其典型的硬件结构可以包括卷积计算引擎,向量运算引擎,存储单元,以及整体调度单元和部分和累加单元。其中,卷积计算引擎和向量运算引擎中的核心运算是乘累加。In related technologies, the typical hardware structure of an artificial intelligence computing engine applied to neural networks may include a convolution computing engine, a vector operation engine, a storage unit, an overall scheduling unit and a partial and accumulation unit. Among them, the core operation in the convolution calculation engine and vector operation engine is multiplication and accumulation.

以浮点数据A,B,C,D,E,F等浮点数的乘累加的运算过程为例,任何一个浮点数据都可以表示为以下的形式:Taking the multiplication and accumulation operation process of floating point data A, B, C, D, E, F and other floating point numbers as an example, any floating point data can be expressed in the following form:

A=2ea×ma (1)A=2 ea ×ma (1)

其中,ea表示阶码,2表示基数,ma为尾数。因此,乘累加运算单元的运算过程如下所示:Among them, ea represents the exponent code, 2 represents the base, and ma is the mantissa. Therefore, the operation process of the multiply-accumulate operation unit is as follows:

数据在进行计算之前,首先将乘累加操作中的所有输入操作数转化为浮点表示的格式,则在数据传输过程中,只需要传输每个数据的阶数和尾数即可。在进行累加运算时,将各个乘法结果进行累加,相当于首先算A×B+C×D,再加上E×F的结果。Before the data is calculated, all input operands in the multiply-accumulate operation are first converted into floating point representation format. Then during the data transmission process, only the order and mantissa of each data need to be transmitted. When performing an accumulation operation, accumulating each multiplication result is equivalent to first calculating A×B+C×D, and then adding the result of E×F.

从公式(2)可以看出,每一个乘法运算可以包含阶数和尾数两部分,以A×B为例,该乘法运算可以包含阶数部分2ea+eb和尾数部分(ma×mb),其中,ma×mb相当于一个定点的乘法操作,在硬件实现上可以使用一个乘法器来完成;2ea+eb×(ma×mb)相当于对ma×mb的结果进行移位操作,在硬件实现上可以使用一个移位器来完成;然而,这可能会导致A×B=2ea+eb×(ma×mb)位宽非常大,举例而言,浮点数A=2ea×ma中,ea和ma的位宽一共是8bit,其中ea的位宽为4bit,可以表示的数据范围是0~15,则在进行乘法运算A×B时,得到的ea+eb的数据范围是0~30,此时,需要使用的位宽变为5bit。It can be seen from formula (2) that each multiplication operation can include the order part and the mantissa part. Taking A×B as an example, the multiplication operation can include the order part 2 ea+eb and the mantissa part (ma×mb), Among them, ma×mb is equivalent to a fixed-point multiplication operation, which can be completed by using a multiplier in hardware implementation; 2 ea+eb × (ma×mb) is equivalent to a shift operation on the result of ma×mb. In hardware implementation Implementation can be accomplished using a shifter; however, this may result in a very large bit width of A×B=2 ea+eb ×(ma×mb). For example, in the floating point number A=2 ea ×ma, The bit width of ea and ma is 8 bits in total, of which the bit width of ea is 4 bits, and the data range that can be represented is 0 to 15. When performing the multiplication operation A×B, the data range of ea+eb obtained is 0 to 30. , at this time, the bit width to be used becomes 5bit.

在上述过程中,两个数据的阶数进行一次加法运算位宽就多了1bit,而在公式(2)中除了进行阶数的加法,还需要对数据进行累加运算,不但使数据在进行累加运算时位宽变大,而且随着累加次数的增多,对加法器输入数据的位宽要求也越来越大,运算量的增加拖慢了运算速度,从而增大了硬件功耗,降低了数据处理效率。In the above process, the bit width of the two data orders increases by 1 bit after one addition operation. In formula (2), in addition to the addition of orders, the data also needs to be accumulated. Not only does the data accumulate The bit width becomes larger during operation, and as the number of accumulations increases, the bit width requirements for the input data of the adder are also getting larger and larger. The increase in the amount of operations slows down the operation speed, thereby increasing the hardware power consumption and reducing the cost. Data processing efficiency.

鉴于上述问题,本公开示例性实施方式提供一种神经网络的数据处理方法。下面结合图1对本示例性实施方式运行环境的系统架构进行说明。In view of the above problems, exemplary embodiments of the present disclosure provide a data processing method of a neural network. The system architecture of the running environment of this exemplary embodiment will be described below with reference to Figure 1 .

参考图1所示,系统架构100可以包括终端设备110与服务器120。其中,终端设备110可以是智能手机、平板电脑、笔记本电脑等电子设备。服务器120泛指提供本示例性实施方式中神经网络的数据处理相关服务的后台系统,如可以是实现神经网络的数据处理的服务器。服务器120可以是一台服务器或多台服务器形成的集群,本公开对此不做限定。终端设备110与服务器120之间可以通过有线或无线的通信链路形成连接,以进行数据交互。Referring to FIG. 1 , the system architecture 100 may include a terminal device 110 and a server 120 . The terminal device 110 may be an electronic device such as a smartphone, a tablet computer, or a laptop computer. The server 120 generally refers to a backend system that provides services related to data processing of the neural network in this exemplary embodiment. For example, it may be a server that implements data processing of the neural network. The server 120 may be one server or a cluster formed by multiple servers, which is not limited in this disclosure. The terminal device 110 and the server 120 may be connected through a wired or wireless communication link for data exchange.

在一种实施例中,神经网络可以部署在终端设备110,由终端设备110执行本示例性实施方式中的神经网络的数据处理方法。In one embodiment, the neural network can be deployed on the terminal device 110, and the terminal device 110 executes the data processing method of the neural network in this exemplary embodiment.

在一种实施方式中,神经网络也可以部署在服务器120,终端设备110向服务器120发送待处理数据后,由服务器120执行本示例性实施方式中的神经网络的数据处理方法。In one implementation, the neural network can also be deployed on the server 120. After the terminal device 110 sends the data to be processed to the server 120, the server 120 executes the data processing method of the neural network in this exemplary implementation.

下面结合图2对神经网络的数据处理方法进行说明。图2示出了神经网络的数据处理方法的示例性流程,包括以下步骤S210至S230:The data processing method of the neural network is explained below with reference to Figure 2. Figure 2 shows an exemplary flow of the data processing method of the neural network, including the following steps S210 to S230:

步骤S210,获取输入至神经网络中的当前运算单元的多组待处理数据,该待处理数据包括阶数和尾数;Step S210: Obtain multiple sets of data to be processed that are input to the current computing unit in the neural network. The data to be processed includes orders and mantissas;

步骤S220,提取多组待处理数据的公共阶数因子,并对每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘,得到每一组待处理数据对应的待累加数据;Step S220: Extract the common order factors of multiple groups of data to be processed, and multiply the mantissa in each group of data to be processed by the remaining order after extracting the common order factor to obtain the order factors corresponding to each group of data to be processed. Accumulate data;

步骤S230,对各组待处理数据对应的待累加数据进行累加,并将累加结果与公共阶数因子相乘,得到当前运算单元的输出数据。Step S230: Accumulate the data to be accumulated corresponding to each group of data to be processed, and multiply the accumulation result by the common order factor to obtain the output data of the current operation unit.

基于上述方法,一方面,基于已有的定点乘法器和加法器,对每一组待处理数据提取公共阶数因子,使得本公开不但可以处理浮点运算,还可以处理定点运算,提高了神经网络的数据处理方法的通用性;另一方面,将每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘,以得到待累加数据,再对待累加数据进行累加,减小了待累加数据和累加运算的位宽,提高了运算速度,进一步提高了神经网络的数据处理效率的同时降低了硬件功耗。Based on the above method, on the one hand, based on existing fixed-point multipliers and adders, common order factors are extracted for each set of data to be processed, so that the present disclosure can not only handle floating-point operations, but also fixed-point operations, improving neural The versatility of the network's data processing method; on the other hand, multiply the mantissa in each set of data to be processed by the remaining order after extracting the common order factor to obtain the data to be accumulated, and then accumulate the data to be accumulated, The bit width of the data to be accumulated and the accumulation operation is reduced, the operation speed is improved, the data processing efficiency of the neural network is further improved, and the hardware power consumption is reduced.

下面对图2中的每个步骤进行具体说明。Each step in Figure 2 is described in detail below.

参考图2,在步骤S210中,获取输入至神经网络中的当前运算单元的多组待处理数据,该待处理数据包括阶数和尾数。Referring to FIG. 2 , in step S210 , multiple sets of data to be processed that are input to the current computing unit in the neural network are obtained, and the data to be processed include orders and mantissas.

其中,神经网络可以是一种模仿生物神经网络行为特征,以进行分布式并行信息处理的算法数学模型,神经网络可以依靠系统的复杂程度,通过调整内部大量节点之间相互连接的关系,从而达到处理信息的目的,通常生物神经网络主要指人脑的神经网络。例如,神经网络可以包括卷积神经网络,卷积神经网络可以是一类包含卷积计算且具有深度结构的前馈神经网络,可以进行特征学习。Among them, the neural network can be an algorithmic mathematical model that imitates the behavioral characteristics of biological neural networks to perform distributed parallel information processing. The neural network can rely on the complexity of the system by adjusting the interconnection relationships between a large number of internal nodes, thereby achieving For the purpose of processing information, biological neural networks usually mainly refer to the neural networks of the human brain. For example, the neural network may include a convolutional neural network, which may be a type of feedforward neural network that includes convolutional calculations and has a deep structure that can perform feature learning.

待处理数据可以是输入神经网络的当前运算单元、需要进行计算的数据,可以是任意类型的数据,如图像数据,文本数据或音频数据,或上述数据经过神经网络的一部分中间层处理后的中间数据。待处理数据可以包括阶数和尾数;在硬件层面中,可以通过对尾数进行阶数表示的位数进行移位以表示待处理数据,例如,待处理数据A可以表示为A=2ea×ma,其中2ea可以是待处理数据A的阶数,ma可以是待处理数据A的尾数,则在硬件层面中,可以通过对ma左移ea位以表示待处理数据A。本公开对待处理数据的具体内容不作特殊限定。The data to be processed can be the current computing unit input to the neural network, the data that needs to be calculated, it can be any type of data, such as image data, text data or audio data, or the above-mentioned data after being processed by a part of the intermediate layer of the neural network. data. The data to be processed can include the order and the mantissa; at the hardware level, the data to be processed can be represented by shifting the number of bits represented by the order of the mantissa. For example, the data A to be processed can be expressed as A=2 ea ×ma , where 2 ea can be the order of the data A to be processed, ma can be the mantissa of the data A to be processed, then at the hardware level, the data A to be processed can be represented by shifting ma to the left by ea bits. This disclosure does not impose any special restrictions on the specific content of the data to be processed.

可以将多组待处理数据输入当前运算单元进行运算,当前运算单元可以执行乘累加运算,乘累加运算单元中可以包含多个累加器和乘法器,其中,乘法器完成数据的乘法运算;累加器负责将乘法运算获取到的乘积进行加法运算。在进行累加运算时,其输入除了当前模块中每个乘法器的结果之外,还需要加上当前模块的上一级模块的输出数据。Multiple sets of data to be processed can be input into the current operation unit for operation. The current operation unit can perform multiplication and accumulation operations. The multiplication and accumulation operation unit can contain multiple accumulators and multipliers, among which the multiplier completes the multiplication operation of data; the accumulator Responsible for adding the products obtained by multiplication operations. When performing an accumulation operation, in addition to the result of each multiplier in the current module, the input also needs to add the output data of the previous module of the current module.

在一种实施方式中,本公开的多组待处理数据中的每一组待处理数据可以用于执行乘法运算,以获取每一组待处理数据,不同组之间的待处理数据可以用于执行累加操作;例如,如图3所示,其中,A,B,C,D,E,F为六个待处理数据,图3中共有三组数据:A与B是一组,C与D是一组,E与F是一组;每一组待处理数据可以用于执行乘法运算,即A×B,C×D,E×F,以得到每一组待处理数据;不同组待处理数据可以执行累加运算,即可以将A×B,C×D,E×F相加,可以得到A×B+C×D+E×F,再加上上一级运算单元的输出数据,以获取当前运算单元的输出数据。In one implementation, each group of data to be processed in the multiple groups of data to be processed in the present disclosure can be used to perform multiplication operations to obtain each group of data to be processed, and data to be processed between different groups can be used to Perform an accumulation operation; for example, as shown in Figure 3, where A, B, C, D, E, and F are six data to be processed. There are three sets of data in Figure 3: A and B are one set, and C and D are One group, E and F are one group; each group of data to be processed can be used to perform multiplication operations, that is, A×B, C×D, E×F, to obtain each group of data to be processed; different groups of data to be processed Accumulation operations can be performed, that is, A×B, C×D, and E×F can be added to obtain A×B+C×D+E×F, plus the output data of the upper-level operation unit to obtain The output data of the current computing unit.

在一种实施方式中,上述待处理数据可以包括当前运算单元的上一级运算单元的输出数据。In one implementation, the above-mentioned data to be processed may include output data of an upper-level computing unit of the current computing unit.

其中,上一级运算单元可以是完成乘累加运算的运算单元,可以输出乘累加运算的运算结果,即上一级运算单元的输出数据,上一级运算单元的输出数据可以表示为阶数与尾数相乘的形式,本公开对上一级运算单元的运算方式和上一级运算单元的输出数据的具体形式不作特殊限定。Among them, the upper-level operation unit can be an operation unit that completes the multiplication-accumulation operation, and can output the operation result of the multiplication-accumulation operation, that is, the output data of the upper-level operation unit. The output data of the upper-level operation unit can be expressed as the order and Regarding the form of mantissa multiplication, this disclosure does not place any special restrictions on the operation mode of the upper-level operation unit and the specific form of the output data of the upper-level operation unit.

在步骤S220中,提取多组待处理数据的公共阶数因子,并对每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘,得到每一组待处理数据对应的待累加数据;In step S220, extract the common order factors of multiple groups of data to be processed, and multiply the mantissa in each group of data to be processed by the remaining order after extracting the common order factor to obtain the corresponding order of each group of data to be processed. The data to be accumulated;

其中,每一组待处理数据可以包括两个待处理数据,每一组待处理数据可以用于执行乘法运算,不同组之间的待处理数据可以用于执行累加运算;在一种实施方式中,每一组待处理数据还可以包括当前运算单元的上一级运算单元的输出数据,本公开对每一组待处理数据的具体内容不作特殊限定。Wherein, each group of data to be processed can include two data to be processed, each group of data to be processed can be used to perform multiplication operations, and data to be processed between different groups can be used to perform accumulation operations; in one embodiment , each group of data to be processed may also include the output data of the upper-level computing unit of the current computing unit. This disclosure does not specifically limit the specific content of each group of data to be processed.

本示例性实施方式中,可以将每一组待处理数据视为一个整体,每一组待处理数据可以表示为总阶数乘以总尾数的形式,例如,一组待处理数据为A×B,其中,A=2ea×ma,B=2eb×mb,则A×B可以表示为A×B=2ea×ma×2eb×mb=2ea+eb×(ma×mb),在该组待处理数据中,总阶数是2ea+eb,总尾数是(ma×mb)。In this exemplary embodiment, each set of data to be processed can be regarded as a whole, and each set of data to be processed can be expressed in the form of the total order multiplied by the total mantissa. For example, a set of data to be processed is A×B , where A=2 ea ×ma, B=2 eb ×mb, then A×B can be expressed as A×B=2 ea ×ma×2 eb ×mb=2 ea+eb ×(ma×mb), in In this set of data to be processed, the total order is 2 ea+eb and the total mantissa is (ma×mb).

其中,总阶数可以是每一组待处理数据中不同阶数的乘积,总尾数可以是每一组待处理数据中不同尾数的乘积。需要说明的是,若某一组待处理数据仅包括一个数据,则该数据的阶数可以是该组待处理数据的总阶数,该数据的尾数可以是该组待处理数据的总尾数。Among them, the total order can be the product of different orders in each group of data to be processed, and the total mantissa can be the product of different mantissas in each group of data to be processed. It should be noted that if a certain set of data to be processed only includes one piece of data, the order of the data may be the total order of the set of data to be processed, and the mantissa of the data may be the total mantissa of the set of data to be processed.

公共阶数因子可以是各组待处理数据的总阶数之间的公因数,例如,有两组待处理数据A×B和C×D,其中,A×B=2ea+eb×(ma×mb),C×D=2ec+ed×(mc×md),则公共阶数因子可以是一组待处理数据的总阶数2ea+eb或2ec+ed;在一种实施方式中,公共阶数因子还可以是某个预设阶数值,该预设阶数值可以是小于总阶数的最小值的阶数值;公共阶数因子还可以是上一级运算单元的的输出数据的阶数;本公开对公共阶数因子和总阶数的获取方式不作特殊限定。The common order factor can be the common factor between the total orders of each set of data to be processed. For example, there are two sets of data to be processed, A×B and C×D, where A×B=2 ea+eb ×(ma ×mb), C×D=2 ec+ed ×(mc×md), then the common order factor can be the total order of a set of data to be processed 2 ea+eb or 2 ec+ed ; in one implementation , the common order factor can also be a preset order value, and the preset order value can be an order value less than the minimum value of the total order; the public order factor can also be the output data of the upper-level computing unit The order; this disclosure does not place special restrictions on the method of obtaining the public order factor and the total order.

在一种实施方式中,上述提取多组待处理数据的公共阶数因子,可以包括:In one implementation, the above-mentioned extraction of common order factors of multiple sets of data to be processed may include:

确定每一组待处理数据的总阶数,通过将总阶数进行比较,以获取公共阶数因子。Determine the total order of each set of data to be processed, and compare the total orders to obtain the common order factor.

例如,有两组待处理数据:第一组待处理数据A和B,第二组待处理数据C和D,则可以确定第一组待处理数据的总阶数为2ea+eb,确定第二组待处理数据的总阶数为2ec+ed,对总阶数2ea+eb和2ec+ed进行比较,当总阶数的基数相同时,可以通过比较总阶数的阶码ea+eb和ec+ed以实现对总阶数的比较,从而获取公共阶数因子。For example, there are two groups of data to be processed: the first group of data to be processed A and B, and the second group of data to be processed C and D. Then it can be determined that the total order of the first group of data to be processed is 2 ea + eb , and the order of the first group of data to be processed is determined to be 2 ea + eb . The total order of the two sets of data to be processed is 2 ec+ed . Compare the total order 2 ea+eb and 2 ec+ed . When the base of the total order is the same, you can compare the order code ea of the total order. +eb and ec+ed to compare the total order to obtain the common order factor.

在一种实施方式中,上述通过将总阶数进行比较,以获取公共阶数因子,可以包括以下步骤:In one implementation, the above-mentioned comparison of total orders to obtain a common order factor may include the following steps:

根据总阶数的最小值确定公共阶数因子。Determine the common order factor based on the minimum value of the total order.

在一种实施方式中,若当前运算单元没有上一级运算单元时,则待处理数据仅包括当前运算单元处理的数据,总阶数可以包括每一组待处理数据的总阶数;若当前运算单元有上一级运算单元时,则待处理数据可以包括当前运算单元的上一级运算单元的输出数据,在确定公共阶数因子时,总阶数可以包括当前运算单元的上一级运算单元的输出数据的阶数2emin'(即当前运算单元的上一级运算单元的公共阶数因子)和其他每一组待处理数据的总阶数,可以根据总阶数的最小值确定公共阶数因子2emin。此外,当只有一组待处理数据时,可以直接将该组待处理数据的总阶数作为公共阶数因子。In one implementation, if the current computing unit does not have an upper-level computing unit, the data to be processed only includes data processed by the current computing unit, and the total order may include the total order of each group of data to be processed; if the current When the arithmetic unit has an upper-level arithmetic unit, the data to be processed may include the output data of the upper-level arithmetic unit of the current arithmetic unit. When determining the common order factor, the total order may include the upper-level arithmetic operation of the current arithmetic unit. The order 2 emin ' of the output data of the unit (that is, the common order factor of the upper-level operation unit of the current operation unit) and the total order of each other group of data to be processed can be determined based on the minimum value of the total order. Order factor 2 emin . In addition, when there is only one set of data to be processed, the total order of the set of data to be processed can be directly used as the common order factor.

例如,在两组待处理数据中确定公共阶数因子:第一组待处理数据A和B,第二组待处理数据C和D,第一组待处理数据的总阶数为2ea+eb,第二组待处理数据的总阶数为2ec+ed,对2ea+eb和2ec+ed进行比较,由于总阶数的基数都是2,则可以对总阶数的阶码ea+eb和ec+ed进行比较,选取阶码ea+eb和ec+ed中的最小值emin作为公共阶数因子的阶码,即可得到公共阶数因子2eminFor example, determine the common order factor in two groups of data to be processed: the first group of data to be processed A and B, the second group of data to be processed C and D, the total order of the first group of data to be processed is 2 ea + eb , the total order of the second group of data to be processed is 2 ec+ed . Compare 2 ea+eb and 2 ec+ed . Since the base of the total order is 2, you can compare the order code ea of the total order. +eb and ec+ed are compared, and the minimum value emin among the order codes ea+eb and ec+ed is selected as the order code of the common order factor, and the common order factor 2 emin can be obtained.

根据总阶数的最小值确定公共阶数因子,可以在最大程度上降低待累加数据的位宽;由于总阶数的最小值是每一组待处理数据的最大公因数,因此,将总阶数的最小值作为公共阶数因子进行提取,可以获取最小位宽的剩余阶数。Determining the common order factor based on the minimum value of the total order can reduce the bit width of the data to be accumulated to the greatest extent; since the minimum value of the total order is the greatest common factor of each group of data to be processed, the total order The minimum value of the number is extracted as the common order factor, and the remaining order of the minimum bit width can be obtained.

在一种实施方式中,剩余阶数可以包括每一组待处理数据的总阶数除以公共阶数因子得到的商。In one implementation, the remaining order may include a quotient obtained by dividing the total order of each set of data to be processed by a common order factor.

例如,若当前运算单元为第一级运算单元时,则此时待处理数据中不包括上一级运算单元输出结果,可以根据公式(3)对A和B、C和D、E和F等多组待处理数据进行提取公共阶数因子2emin,则在公式(3)中,可以包括公共阶数因子和待累加数据,待累加数据可以包括对每一组待处理数据对应的尾数与剩余阶数的乘积进行累加得到的运算结果。For example, if the current computing unit is the first-level computing unit, the output results of the previous-level computing unit are not included in the data to be processed at this time. A and B, C and D, E and F, etc. can be calculated according to formula (3). Multiple groups of data to be processed are used to extract the common order factor 2 emin . In formula (3), the common order factor and the data to be accumulated can be included. The data to be accumulated can include the mantissa and remainder corresponding to each group of data to be processed. The operation result obtained by accumulating the products of orders.

如2(ea+eb)-emin×K1,K1=(ma×mb),其中,剩余阶数2(ea+eb)-emin可以包括待处理数据的总阶数2ea+eb除以公共阶数因子2emin得到的商。For example, 2 (ea+eb)-emin ×K1, K1=(ma×mb), where the remaining order 2 (ea+eb)-emin can include the total order of the data to be processed 2 ea+eb divided by the common order Factor 2 emin to get the quotient.

在一种实施方式中,当前运算单元可以包括乘法器和加法器,可以通过乘法器对每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘,如可以通过乘法器对公式(3)中每一组待处理数据的剩余阶数和尾数相乘。In one implementation, the current operation unit may include a multiplier and an adder. The multiplier may be used to multiply the mantissa in each set of data to be processed by the remaining order after extracting the common order factor. For example, the multiplier may be used to The device multiplies the remaining order and mantissa of each group of data to be processed in formula (3).

其中,emin是ea+eb,ec+ed,ee+ef,…中的最小值,可以将2emin作为公共阶数因子对每一组待处理数据乘积进行提取,K1=(ma×mb),K2=(mc×md),K3=(me×mf)…Among them, emin is the minimum value among ea+eb, ec+ed, ee+ef,..., 2 emin can be used as a common order factor to extract the product of each set of data to be processed, K1=(ma×mb), K2=(mc×md), K3=(me×mf)…

在一种实施方式中,当前运算单元可以包括阶数处理单元和尾数处理单元,可以将每一个待处理数据的阶数和尾数分别送入阶数处理单元和尾数处理单元进行处理。In one implementation, the current operation unit may include an order processing unit and a mantissa processing unit, and the order and mantissa of each data to be processed may be sent to the order processing unit and the mantissa processing unit respectively for processing.

例如,有多个待处理数据A、B、C、D、E、F以及当前运算单元的上一级运算单元的输出数据Resultprev,可以根据公式(4)对待处理数据进行计算;由于每一个待处理数据都可以包括阶数和尾数,因此,可以将每一个待处理数据的阶数和尾数分别送入阶数处理单元和尾数处理单元进行处理。For example, if there are multiple data to be processed A, B, C, D, E, F and the output data Result prev of the upper-level computing unit of the current computing unit, the data to be processed can be calculated according to formula (4); since each The data to be processed can include the order and the mantissa. Therefore, the order and mantissa of each data to be processed can be sent to the order processing unit and the mantissa processing unit respectively for processing.

其中,emin可以是ea+eb,ec+ed,ee+ef,emin'中的最小值,可以将emin作为公共阶数因子对每一组待处理数据乘积进行提取,K1=(ma×mb),K2=(mc×md),K3=(me×mf),R'可以是当前运算单元的上一级运算单元的输出数据的尾数。Among them, emin can be the minimum value among ea+eb, ec+ed, ee+ef, and emin'. Emin can be used as a common order factor to extract the product of each set of data to be processed, K1=(ma×mb) , K2=(mc×md), K3=(me×mf), R' can be the mantissa of the output data of the upper-level operation unit of the current operation unit.

在一种实施方式中,在阶数处理单元中,如图4所示,阶数处理单元需要接收当前运算单元的上一级运算单元的公共阶数因子的阶码emin',再将每一个待处理数据的阶数的阶码送入加法器以获取其他组待处理数据的总阶数的阶码,此时,每一组待处理数据的总阶数的阶码可以包括ea+eb,ec+ed,ee+ef和emin',将所有总阶数的阶码输入MIN模块,以获取总阶数的阶码的最小值,从而可以根据总阶数的阶码的最小值获取当前运算单元的公共阶数因子的阶码emin。In one implementation, in the order processing unit, as shown in Figure 4, the order processing unit needs to receive the order code emin' of the common order factor of the upper-level operation unit of the current operation unit, and then convert each The order code of the data to be processed is sent to the adder to obtain the total order code of other groups of data to be processed. At this time, the order code of the total order of each group of data to be processed can include ea+eb, ec+ed, ee+ef and emin', input all the total order code into the MIN module to obtain the minimum value of the total order code, so that the current operation can be obtained based on the minimum value of the total order code The order emin of the common order factor of the unit.

在一种实施方式中,如果某一组待处理数据为上一级运算单元的输出数据,则该组待处理数据的总阶数可以是上一级运算单元的公共阶数因子。In one implementation, if a certain set of data to be processed is the output data of the upper-level operation unit, the total order of the set of data to be processed may be the common order factor of the upper-level operation unit.

在一种实施方式中,如果待处理数据包括上一级运算单元的输出数据,则可以通过将每一组待处理数据的总阶数与上一级运算单元的公共阶数因子进行比较,以获取公共阶数因子。In one implementation, if the data to be processed includes the output data of the upper-level operation unit, then the total order of each group of data to be processed can be compared with the common order factor of the upper-level operation unit. Get the common order factor.

在一种实施方式中,在尾数处理单元中,如图5所示,可以将待处理数据的尾数输入至乘法器,以获取每一组待处理数据对应的尾数K1,K2和K3,同时接收当前运算单元的上一级运算单元的尾数R';再将ea+eb,ec+ed,ee+ef,emin'以及emin输入减法器,计算ea+eb-emin,ec+ed-emin,ee+ef-emin,emin'-emin的数值;然后,可以根据ea+eb-emin,ec+ed-emin,ee+ef-emin,emin'-emin的数值对K1,K2,K3和R'进行移位操作,并将移位后的结果相加,以获取当前运算单元的输出结果的尾数R,最后,可以将当前运算单元的输出结果中的公共阶数因子的阶码emin和R输出至当前运算单元的下一级运算单元继续对数据进行处理。In one implementation, in the mantissa processing unit, as shown in Figure 5, the mantissa of the data to be processed can be input to the multiplier to obtain the mantissas K1, K2 and K3 corresponding to each group of data to be processed, and receive The mantissa R' of the upper-level computing unit of the current computing unit; then input ea+eb, ec+ed, ee+ef, emin' and emin into the subtractor to calculate ea+eb-emin, ec+ed-emin, ee +ef-emin, the value of emin'-emin; then, K1, K2, K3 and R' can be performed according to the value of ea+eb-emin, ec+ed-emin, ee+ef-emin, emin'-emin Shift operation, and add the shifted results to obtain the mantissa R of the output result of the current operation unit. Finally, the order codes emin and R of the common order factor in the output result of the current operation unit can be output to The next-level computing unit of the current computing unit continues to process the data.

在一种实施例中,本公开基于定点乘法器和定点加法器,对每一组待处理数据提取公共阶数因子,使得本方案既可以处理浮点运算,还可以处理定点运算,提高了神经网络的数据处理方法的通用性;此外,将每一组待处理数据的提取了公共阶数因子后的剩余阶数与尾数相乘,以得到待累加数据,减小了加法运算的位宽,例如,在位宽为16bit的待处理数据A和B中,阶数的阶码ea、eb可以使用5bit,尾数ma、mb可以使用11bit;则ea和eb可以表示的数据范围均是0~31,若直接将ea和eb相加,则得到的ea+eb的数据范围可以是0~62,此时需要使用的位宽变为6bit,且,随着加法的增多,需要使用的位宽将越来越大;而本公开在计算A×B时提取了公共阶数因子,在进行阶数的加法运算时,通过减去公共阶数因子的阶码emin的操作,使得ea+eb-emin的数据范围仍然是0~31,在进行加法运算后的运算结果仍然可以使用的位宽为5bit。In one embodiment, the present disclosure extracts common order factors for each set of data to be processed based on fixed-point multipliers and fixed-point adders, so that this solution can handle both floating-point operations and fixed-point operations, improving neural processing. The versatility of the network's data processing method; in addition, the remaining order after extracting the common order factor of each group of data to be processed is multiplied by the mantissa to obtain the data to be accumulated, which reduces the bit width of the addition operation. For example, in the data to be processed A and B with a bit width of 16 bits, the order codes ea and eb can use 5 bits, and the mantissas ma and mb can use 11 bits; then the data range that ea and eb can represent is 0 to 31. , if ea and eb are added directly, the data range of ea+eb obtained can be 0 to 62. At this time, the bit width to be used becomes 6 bits, and as the number of additions increases, the bit width to be used will be getting bigger and bigger; and this disclosure extracts the common order factor when calculating A×B, and when performing the order addition operation, by subtracting the order code emin of the common order factor, ea+eb-emin The data range is still 0 to 31, and the bit width that can still be used for the operation result after the addition operation is 5 bits.

继续参考图2,在步骤S230,对各组待处理数据对应的待累加数据进行累加,并将累加结果与公共阶数因子相乘,得到当前运算单元的输出数据。Continuing to refer to Figure 2, in step S230, the data to be accumulated corresponding to each group of data to be processed is accumulated, and the accumulation result is multiplied by the common order factor to obtain the output data of the current operation unit.

其中,待累加数据可以包括每一组待处理数据对应的尾数和提取了公共阶数因子后的剩余阶数,例如,本公开对一组待处理数据A=2ea×ma和B=2eb×mb进行处理,该组待处理数据的公共阶数因子为2emin,则该组待处理数据对应的待累加数据可以表示为2(ea +eb)-emin×(ma×mb),本公开对待累加数据的具体内容不作特殊限定。The data to be accumulated may include the mantissa corresponding to each group of data to be processed and the remaining order after extracting the common order factor. For example, in this disclosure, for a group of data to be processed A=2 ea ×ma and B=2 eb ×mb is processed, and the common order factor of the group of data to be processed is 2 emin , then the data to be accumulated corresponding to the group of data to be processed can be expressed as 2 (ea +eb)-emin × (ma×mb), this disclosure There are no special restrictions on the specific content of the accumulated data.

在一种实施方式中,对各组待处理数据对应的待累加数据进行累加,并将累加结果与公共阶数因子相乘,得到当前运算单元的输出数据可以包括:In one implementation, accumulating data to be accumulated corresponding to each group of data to be processed, and multiplying the accumulation result by a common order factor to obtain the output data of the current operation unit may include:

通过加法器对各组待处理数据对应的待累加数据进行累加,并通过乘法器将累加结果与公共阶数因子相乘,得到所述当前运算单元的输出数据。The adder is used to accumulate the data to be accumulated corresponding to each group of data to be processed, and the multiplier is used to multiply the accumulated result by the common order factor to obtain the output data of the current operation unit.

在一种实施方式中,当前运算单元的输出数据可以是公共阶数因子和累加结果的乘积,可以将当前运算单元的输出数据表示为阶数与尾数的相乘的形式,,本公开对当前运算单元的输出数据的具体形式不作特殊限定。In one implementation, the output data of the current operation unit may be the product of the common order factor and the accumulation result. The output data of the current operation unit may be expressed as the multiplication of the order and the mantissa. This disclosure is useful for the current The specific form of the output data of the arithmetic unit is not particularly limited.

在一种实施方式中,可以将当前运算单元的输出数据输入至下一级运算单元以继续进行数据的处理。In one implementation, the output data of the current computing unit can be input to the next-level computing unit to continue data processing.

在一种实施方式中,当前运算单元还可以包括进位保留加法器,通过加法器对各组待处理数据对应的待累加数据进行累加,可以包括以下步骤:In one implementation, the current operation unit may also include a carry-preserving adder, and the adder may be used to accumulate data to be accumulated corresponding to each group of data to be processed, which may include the following steps:

当待累加数据的项数小于或等于预设项数时,通过加法器将待累加数据相加,以获取累加结果;When the number of data items to be accumulated is less than or equal to the preset number of items, the data to be accumulated is added through the adder to obtain the accumulation result;

当待累加数据的项数大于预设项数时,结合进位保留加法器和加法器将待累加数据相加,以获取累加结果。When the number of items of data to be accumulated is greater than the preset number of items, a carry-preserving adder and an adder are combined to add the data to be accumulated to obtain the accumulation result.

其中,进位保留加法器可以是一种用于对大量操作数求和的加法器。输入三个源操作数,可以输出两个运算结果,通过CSA可以将三个数的加法压缩为两个数的加法,在进行多个数据相加时可以将进位保留下来,只做一次进位传递。A carry-preserving adder may be an adder used to sum a large number of operands. Input three source operands and output two operation results. Through CSA, the addition of three numbers can be compressed into the addition of two numbers. When adding multiple data, the carry can be retained and only one carry transfer is performed. .

预设项数可以是两项,三项或其他项数,本公开对预设项数的具体项数不作特殊限定。The number of preset items may be two items, three items, or other items. This disclosure does not place a special limit on the specific number of the preset items.

在每个CSA的计算过程中,可以输入三个操作数a1,a2,a3,可以根据下述公式(5)和公式(6)得到两个输出数据b1和b2In the calculation process of each CSA, three operands a 1 , a 2 , a 3 can be input, and two output data b 1 and b 2 can be obtained according to the following formula (5) and formula (6):

b1=a1∧a2∧a3 (5)b 1 =a 1 ∧a 2 ∧a 3 (5)

b2=((a1&a2)|(a2&a3)|(a1&a3))<<1 (6)b 2 =((a 1 &a 2 )|(a 2 &a 3 )|(a 1 &a 3 ))<<1 (6)

在一种实施方式中,若待累加数据只有两项,则可以通过加法器直接将待处理数据进行相加;若待累加数据大于两项,则可以结合进位保留加法器(Carry Save Adder,CSA)对累加运算进行压缩,将待累加数据压缩至两项时,再使用加法器将这两项相加,以获取累加结果。In one implementation, if there are only two items of data to be accumulated, the data to be processed can be added directly through an adder; if there are more than two items of data to be accumulated, a Carry Save Adder (CSA) can be used ) compresses the accumulation operation. When the data to be accumulated is compressed to two items, the adder is then used to add the two items to obtain the accumulation result.

例如,如图6所示,可以结合进位保留加法器和加法器对待累加数据a0~a5求和。由于CSA可以输入三个数据,输出两个数据,因此,可以在上述六个待累加数据中选取三个数据输入第一个CSA,再将剩余的三个数据输入第二个CSA;再将第一个CSA的两个输出和第二个CSA的一个输出作为输入数据,输入到第三个CSA中,以获取两个输出数据;将第三个CSA的两个输出数据和第二个CSA剩余的一个输出数据作为输入数据,输入到第四个CSA中,获取两个输出数据b0和b1,此时,通过多个CSA将六个待累加数据a0~a5被压缩为两个输出数据b0和b1,再将b0和b1输入至加法器进行加法运算,以获取待累加数据的累加结果。For example, as shown in Figure 6, a carry-preserving adder and an adder can be combined to sum the accumulated data a 0 to a 5 . Since CSA can input three data and output two data, you can select three data from the above six data to be accumulated and input them into the first CSA, and then input the remaining three data into the second CSA; then input the third data into the second CSA. The two outputs of one CSA and one output of the second CSA are used as input data into the third CSA to obtain two output data; the two output data of the third CSA and the remaining data of the second CSA are An output data of is used as input data and is input into the fourth CSA to obtain two output data b 0 and b 1 . At this time, the six data to be accumulated a 0 ~ a 5 are compressed into two through multiple CSAs. Output data b 0 and b 1 , and then input b 0 and b 1 to the adder for addition operation to obtain the accumulation result of the data to be accumulated.

由公式(5)和(6)可以看出,CSA运算逻辑较为简单,所以,使用CSA对待累加数据进行压缩之后再相加,有效提高了运算速度,降低了累加运算带来的功耗。It can be seen from formulas (5) and (6) that the CSA operation logic is relatively simple. Therefore, using CSA to compress the accumulated data before adding it effectively improves the operation speed and reduces the power consumption caused by the accumulation operation.

在一种实施方式中,图7示出了本公开的神经网络的数据处理方法的一种示例性流程,参考图7所示,可以根据步骤S701~S709基于神经网络进行数据处理。In one implementation, FIG. 7 shows an exemplary flow of the neural network data processing method of the present disclosure. Referring to FIG. 7 , data processing can be performed based on the neural network according to steps S701 to S709.

步骤S701,将多组待处理数据输入至神经网络,该待处理数据包括阶数和尾数;Step S701, input multiple sets of data to be processed into the neural network, where the data to be processed includes order and mantissa;

步骤S702,获取每一组待处理数据的总阶数;Step S702, obtain the total order of each group of data to be processed;

步骤S703,将每一组待处理数据的总阶数的最小值作为公共阶数因子;Step S703, use the minimum value of the total order of each group of data to be processed as the common order factor;

步骤S704,对每一组待处理数据进行提取公共阶数因子;Step S704, extract common order factors from each group of data to be processed;

步骤S705,将每一组待处理数据的尾数和提取了公共阶数因子后的剩余阶数作为待累加数据;Step S705, use the mantissa of each group of data to be processed and the remaining order after extracting the common order factor as the data to be accumulated;

步骤S706,判断当前待累加数据的项数是否为两项,若是,跳转至步骤S707,否则,跳转至步骤S708;Step S706: Determine whether the current number of data items to be accumulated is two items. If so, jump to step S707; otherwise, jump to step S708;

步骤S707,通过加法器将待累加数据的两项直接相加,以得到累加结果;Step S707, directly add the two items of data to be accumulated through an adder to obtain the accumulation result;

步骤S708,结合进位保留加法器和加法器对待累加数据进行加法运算,已得到累加结果;Step S708, combine the carry-preserving adder and the adder to perform an addition operation on the data to be accumulated, and the accumulation result has been obtained;

步骤S709,将累加结果与公共阶数因子的相乘,得到当前运算单元的输出数据。Step S709: Multiply the accumulated result by the common order factor to obtain the output data of the current operation unit.

在一种实施方式中,上述神经网络可以包括卷积神经网络,上述待处理数据可以包括卷积神经网络的卷积核在图像数据中覆盖的图像块数据和卷积核。In one embodiment, the neural network may include a convolutional neural network, and the data to be processed may include image block data and convolution kernels covered by the convolution kernel of the convolutional neural network in the image data.

例如,本公开可以结合卷积神经网络用于图像数据的处理,如图8所示,可以首先将待处理的图像数据输入卷积神经网络,卷积神经网络包括卷积核,可以根据图像数据中卷积核覆盖的图像块获取目标图像块;待处理数据可以包括图像块的像素数据A,B,E,F和卷积核W,X,Y,Z,都可以表示为阶数与尾数相乘的形式;则可以将目标图像块的像素数据与卷积核对应的算子相乘,再将乘积相加,以进行卷积运算,具体地,可以根据下述公式对待处理数据进行卷积运算:For example, the present disclosure can be combined with a convolutional neural network for image data processing. As shown in Figure 8, the image data to be processed can be first input into the convolutional neural network. The convolutional neural network includes a convolution kernel, and can be processed according to the image data. The image block covered by the medium convolution kernel obtains the target image block; the data to be processed can include the pixel data A, B, E, F of the image block and the convolution kernel W, X, Y, Z, which can be expressed as the order and mantissa. In the form of multiplication; the pixel data of the target image block can be multiplied by the operator corresponding to the convolution kernel, and then the products can be added to perform the convolution operation. Specifically, the data to be processed can be convolved according to the following formula Product operation:

其中,2emin是每一组待处理数据的总阶数的最小值;每一组待处理数据可以进行乘法运算,不同组的待处理数据可以进行累加运算;K1=(ma×mw),K2=(mb×mx),K3=(me×my),K4=(mf×mz)。Among them, 2 emin is the minimum value of the total order of each group of data to be processed; each group of data to be processed can be multiplied, and different groups of data to be processed can be accumulated; K1=(ma×mw), K2 =(mb×mx), K3=(me×my), K4=(mf×mz).

在上述目标图像块的像素数据与卷积核的卷积运算过程中,共有四组待处理数据,分别是A和W,B和X,E和Y,以及F和Z,可以首先将每个待处理数据表示为阶数与尾数相乘的形式,以进行卷积运算;再根据每一组待处理数据的总阶数的阶码ea+ew,eb+ex,ee+ey,ef+ez的最小值得到公共阶数因子的阶码emin,进而得到公共阶数因子2emin;可以根据公共阶数因子对每一组待处理数据进行提取公因式,以得到待累加数据;待累加数据可以包括每一组待处理数据对应的尾数K1,K2,K3,K4以及提取公共阶数因子后的剩余阶数;由于待累加数据的项数为四项,如图9所示,可以结合CSA将待累加数据压缩至两项后,再将压缩后的两项输入至加法器进行加法运算,以获取累加结果;最后可以根据公共阶数因子与累加结果的乘积获取当前的目标图像块与卷积核的卷积结果,再将获取的卷积结果输出至下一目标图像块与卷积核的卷积运算中,以获取下一个目标图像块与卷积核的卷积结果。During the convolution operation between the pixel data of the target image block and the convolution kernel, there are four groups of data to be processed, namely A and W, B and X, E and Y, and F and Z. Each can be first The data to be processed is expressed in the form of multiplying the order and the mantissa to perform a convolution operation; and then based on the order code of the total order of each group of data to be processed, ea+ew, eb+ex, ee+ey, ef+ez The minimum value of the order code emin of the common order factor is obtained, and then the common order factor 2 emin is obtained; the common factor can be extracted for each set of data to be processed according to the common order factor to obtain the data to be accumulated; the data to be accumulated It can include the mantissas K1, K2, K3, K4 corresponding to each group of data to be processed and the remaining order after extracting the common order factor; since the number of items to be accumulated is four, as shown in Figure 9, it can be combined with CSA After compressing the data to be accumulated into two items, the compressed two items are then input to the adder for addition operation to obtain the accumulation result; finally, the current target image block and volume can be obtained based on the product of the common order factor and the accumulation result. The convolution result of the convolution kernel is then output to the convolution operation of the next target image block and the convolution kernel to obtain the convolution result of the next target image block and the convolution kernel.

此外,本公开的示例性实施方式还提供一种神经网络的数据处理装置。参考图10所示,该神经网络的数据处理装置1000可以包括:In addition, exemplary embodiments of the present disclosure also provide a data processing apparatus for a neural network. Referring to Figure 10, the data processing device 1000 of the neural network may include:

待处理数据获取模块1010,被配置为获取输入至神经网络中的当前运算单元的多组待处理数据,该待处理数据包括阶数和尾数;The data to be processed acquisition module 1010 is configured to acquire multiple sets of data to be processed input to the current computing unit in the neural network, where the data to be processed includes orders and mantissas;

公共阶数因子提取模块1020,被配置为提取多组待处理数据的公共阶数因子,并对每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘,得到每一组待处理数据对应的待累加数据;The common order factor extraction module 1020 is configured to extract the common order factors of multiple groups of data to be processed, and multiply the mantissa in each group of data to be processed by the remaining order after extracting the common order factor to obtain each A set of data to be accumulated corresponding to the data to be processed;

输出数据获取模块1030,被配置为对各组待处理数据对应的待累加数据进行累加,并将累加结果与公共阶数因子相乘,得到当前运算单元的输出数据。The output data acquisition module 1030 is configured to accumulate the data to be accumulated corresponding to each group of data to be processed, and multiply the accumulation result by the common order factor to obtain the output data of the current operation unit.

在一种实施方式中,上述神经网络可以包括卷积神经网络,待处理数据可以包括卷积神经网络的卷积核在图像数据中覆盖的图像块数据和卷积核。In one implementation, the above-mentioned neural network may include a convolutional neural network, and the data to be processed may include image block data and convolution kernels covered by the convolution kernel of the convolutional neural network in the image data.

在一种实施方式中,上述待处理数据还可以包括当前运算单元的上一级运算单元的输出数据。In one implementation, the above-mentioned data to be processed may also include output data of an upper-level computing unit of the current computing unit.

在一种实施方式中,上述提取多组待处理数据的公共阶数因子,包括:In one implementation, the above-mentioned extraction of common order factors of multiple sets of data to be processed includes:

确定每一组待处理数据的总阶数,通过将总阶数进行比较,以获取公共阶数因子。Determine the total order of each set of data to be processed, and compare the total orders to obtain the common order factor.

在一种实施方式中,上述通过将总阶数进行比较,以获取公共阶数因子,包括:In one implementation, the above method compares the total orders to obtain the common order factors, including:

根据所述总阶数的最小值确定所述公共阶数因子。The common order factor is determined based on the minimum value of the total order.

在一种实施方式中,上述剩余阶数可以包括每一组待处理数据的总阶数除以公共阶数因子得到的商。In one implementation, the above-mentioned remaining order may include a quotient obtained by dividing the total order of each group of data to be processed by a common order factor.

在一种实施方式中,上述当前运算单元可以包括乘法器和加法器,上述对每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘,包括:In one implementation, the above-mentioned current operation unit may include a multiplier and an adder. The above-mentioned multiplication of the mantissa in each group of data to be processed by the remaining order after extracting the common order factor includes:

通过乘法器对每一组待处理数据中的尾数与提取公共阶数因子后的剩余阶数相乘;Use a multiplier to multiply the mantissa in each set of data to be processed by the remaining order after extracting the common order factor;

上述对各组待处理数据对应的待累加数据进行累加,并将累加结果与公共阶数因子相乘,得到当前运算单元的输出数据,包括:The above-mentioned accumulation of data to be accumulated corresponding to each group of data to be processed is performed, and the accumulated result is multiplied by the common order factor to obtain the output data of the current operation unit, including:

通过加法器对各组待处理数据对应的待累加数据进行累加,并通过乘法器将累加结果与公共阶数因子相乘,得到当前运算单元的输出数据。The data to be accumulated corresponding to each group of data to be processed is accumulated through the adder, and the accumulated result is multiplied by the common order factor through the multiplier to obtain the output data of the current operation unit.

在一种实施方式中,上述当前运算单元还可以包括进位保留加法器,上述通过加法器对各组待处理数据对应的待累加数据进行累加,包括:In one implementation, the above-mentioned current operation unit may also include a carry-preserving adder, and the adder is used to accumulate data to be accumulated corresponding to each group of data to be processed, including:

当待累加数据的项数小于或等于预设项数时,通过加法器将待累加数据相加,以获取累加结果;When the number of data items to be accumulated is less than or equal to the preset number of items, the data to be accumulated is added through the adder to obtain the accumulation result;

当待累加数据的项数大于预设项数时,结合进位保留加法器和加法器将待累加数据相加,以获取累加结果。When the number of items of data to be accumulated is greater than the preset number of items, a carry-preserving adder and an adder are combined to add the data to be accumulated to obtain the accumulation result.

上述装置中各部分的具体细节在方法部分实施方式中已经详细说明,因而不再赘述。The specific details of each part of the above-mentioned device have been described in detail in the method implementation, and will not be described again.

本公开的示例性实施方式还提供了一种计算机可读存储介质,可以实现为一种程序产品的形式,其包括程序代码,当程序产品在电子设备上运行时,程序代码用于使电子设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。在一种可选的实施方式中,该程序产品可以实现为便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在电子设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which can be implemented in the form of a program product, which includes program code. When the program product is run on an electronic device, the program code is used to cause the electronic device to The steps described in the "Exemplary Methods" section of this specification above according to various exemplary embodiments of the present disclosure are performed. In an alternative embodiment, the program product may be implemented as a portable compact disk read-only memory (CD-ROM) and include the program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto. In this document, a readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, apparatus, or device.

程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The Program Product may take the form of one or more readable media in any combination. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A readable signal medium may also be any readable medium other than a readable storage medium that can send, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or device.

可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。Program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., as well as conventional procedural programming. Language—such as "C" or a similar programming language. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on. In situations involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device, such as provided by an Internet service. (business comes via Internet connection).

本公开的示例性实施方式还提供一种电子设备。该电子设备可以包括处理器与存储器。存储器存储有处理器的可执行指令,如可以是程序代码。处理器通过执行该可执行指令来执行本示例性实施方式中的方法。Exemplary embodiments of the present disclosure also provide an electronic device. The electronic device may include a processor and memory. The memory stores executable instructions of the processor, such as program codes. The processor performs the method in this exemplary embodiment by executing the executable instructions.

下面参考图11,以通用计算机的形式对电子设备进行示例性说明。应当理解,图11显示的电子设备1100仅仅是一个示例,不应对本公开实施方式的功能和使用范围带来限制。Referring to FIG. 11 , an electronic device will be exemplarily described in the form of a general-purpose computer. It should be understood that the electronic device 1100 shown in FIG. 11 is only an example and should not limit the functions and scope of use of the embodiments of the present disclosure.

如图11所示,电子设备1100可以包括:处理器1110、存储器1120、总线1130、I/O(输入/输出)接口1140、网络适配器1150。As shown in FIG. 11 , the electronic device 1100 may include: a processor 1110 , a memory 1120 , a bus 1130 , an I/O (input/output) interface 1140 , and a network adapter 1150 .

处理器1110可以包括一个或多个处理单元,例如:处理器1110可以包括中央处理器(Central Processing Unit,CPU)、AP(Application Processor,应用处理器)、调制解调处理器、显示处理器(Display Process Unit,DPU)、GPU(Graphics Processing Unit,图形处理器)、ISP(Image Signal Processor,图像信号处理器)、控制器、编码器、解码器、DSP(Digital Signal Processor,数字信号处理器)、基带处理器和/或NPU(Neural-NetworkProcessing Unit,神经网络处理器)等。本示例性实施方式中的神经网络的数据处理方法可以由GPU或NPU来执行,在一种实施方式中,可以在获取待处理数据后,可以将神经网络部署在GPU中,通过GPU对待处理数据提取公共阶数因子,再将待处理数据中提取了公共阶数因子后的剩余数据作为待累加数据,通过CSA和加法器对待累加数据进行加法运算,获得累加结果,将公共阶数因子和累加结果的乘积作为当前运算单元的输出数据The processor 1110 may include one or more processing units. For example, the processor 1110 may include a central processing unit (CPU), an AP (Application Processor, an application processor), a modem processor, a display processor ( Display Process Unit (DPU), GPU (Graphics Processing Unit, graphics processor), ISP (Image Signal Processor, image signal processor), controller, encoder, decoder, DSP (Digital Signal Processor, digital signal processor) , baseband processor and/or NPU (Neural-NetworkProcessing Unit, neural network processor), etc. The data processing method of the neural network in this exemplary embodiment can be executed by a GPU or an NPU. In one embodiment, after obtaining the data to be processed, the neural network can be deployed in the GPU, and the data to be processed can be processed by the GPU. Extract the common order factor, and then use the remaining data after extracting the common order factor from the data to be processed as the data to be accumulated. Add the accumulated data through the CSA and adder to obtain the accumulation result. Add the common order factor and the accumulated data. The product of the results is used as the output data of the current operation unit

存储器1120可以包括易失性存储器,例如RAM 1121、缓存单元1122,还可以包括非易失性存储器,例如ROM 1123。存储器1120还可以包括一个或多个程序模块1124,这样的程序模块1124包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。例如,程序模块1124可以包括上述装置1000中的各模块。The memory 1120 may include volatile memory, such as RAM 1121, cache unit 1122, and may also include non-volatile memory, such as ROM 1123. Memory 1120 may also include one or more program modules 1124, such program modules 1124 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples. This may include the implementation of a network environment. For example, the program module 1124 may include each module in the device 1000 described above.

总线1130用于实现电子设备1100的不同组件之间的连接,可以包括数据总线、地址总线和控制总线。The bus 1130 is used to realize connections between different components of the electronic device 1100 and may include a data bus, an address bus and a control bus.

电子设备1100可以通过I/O接口1140与一个或多个外部设备1100(例如终端设备、键盘、鼠标、外置控制器等)进行通信。Electronic device 1100 may communicate with one or more external devices 1100 (eg, terminal device, keyboard, mouse, external controller, etc.) through I/O interface 1140.

电子设备1100可以通过网络适配器1150与一个或者多个网络通信,例如网络适配器1150可以提供如3G/4G/5G等移动通信解决方案,或者提供如无线局域网、蓝牙、近场通信等无线通信解决方案。网络适配器1150可以通过总线1130与电子设备1100的其它模块通信。The electronic device 1100 can communicate with one or more networks through the network adapter 1150. For example, the network adapter 1150 can provide mobile communication solutions such as 3G/4G/5G, or provide wireless communication solutions such as wireless LAN, Bluetooth, near field communication, etc. . Network adapter 1150 may communicate with other modules of electronic device 1100 via bus 1130 .

尽管图11中未示出,还可以在电子设备1100中设置其它硬件和/或软件模块,包括但不限于:显示器、微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Although not shown in Figure 11, other hardware and/or software modules may also be provided in the electronic device 1100, including but not limited to: a display, microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, Tape drives and data backup storage systems, etc.

应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的示例性实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of equipment for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to exemplary embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into being embodied by multiple modules or units.

所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施方式。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施方式仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Those skilled in the art will understand that various aspects of the present disclosure may be implemented as systems, methods, or program products. Therefore, various aspects of the present disclosure may be embodied in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or an implementation combining hardware and software aspects, which may be collectively referred to herein as "Circuits", "modules" or "systems". Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptations of the disclosure that follow the general principles of the disclosure and include common common sense or customary technical means in the technical field that are not disclosed in the disclosure. . It is intended that the specification and embodiments be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限定。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the disclosure is limited only by the appended claims.

Claims (10)

1.一种神经网络的数据处理方法,其特征在于,包括:1. A neural network data processing method, characterized by including: 获取输入至神经网络中的当前运算单元的多组待处理数据,所述待处理数据包括阶数和尾数;Obtain multiple sets of data to be processed that are input to the current computing unit in the neural network, where the data to be processed includes orders and mantissas; 提取所述多组待处理数据的公共阶数因子,并对每一组待处理数据中的尾数与提取所述公共阶数因子后的剩余阶数相乘,得到每一组待处理数据对应的待累加数据;Extract the common order factor of the multiple groups of data to be processed, and multiply the mantissa in each group of data to be processed by the remaining order after extracting the common order factor to obtain the corresponding order of each group of data to be processed. Data to be accumulated; 对各组待处理数据对应的待累加数据进行累加,并将累加结果与所述公共阶数因子相乘,得到所述当前运算单元的输出数据。The data to be accumulated corresponding to each group of data to be processed are accumulated, and the accumulation result is multiplied by the common order factor to obtain the output data of the current operation unit. 2.根据权利要求1所述的方法,其特征在于,所述神经网络包括卷积神经网络,所述待处理数据包括所述卷积神经网络的卷积核在图像数据中覆盖的图像块数据和所述卷积核。2. The method according to claim 1, wherein the neural network includes a convolutional neural network, and the data to be processed includes image block data covered by the convolution kernel of the convolutional neural network in the image data. and the convolution kernel. 3.根据权利要求1所述的方法,其特征在于,所述待处理数据包括所述当前运算单元的上一级运算单元的输出数据。3. The method according to claim 1, wherein the data to be processed includes output data of an upper-level computing unit of the current computing unit. 4.根据权利要求1所述的方法,其特征在于,所述提取所述多组待处理数据的公共阶数因子,并对每一组待处理数据中的尾数与提取所述公共阶数因子后的剩余阶数相乘,包括:4. The method according to claim 1, characterized in that: extracting the common order factor of the multiple groups of data to be processed, and extracting the common order factor from the mantissa in each group of data to be processed. The remaining orders after are multiplied, including: 确定所述每一组待处理数据的总阶数,通过将所述总阶数进行比较,以获取所述公共阶数因子;Determine the total order of each set of data to be processed, and obtain the common order factor by comparing the total order; 所述剩余阶数包括所述每一组待处理数据的总阶数除以所述公共阶数因子得到的商。The remaining order includes the quotient obtained by dividing the total order of each group of data to be processed by the common order factor. 5.根据权利要求4所述的方法,其特征在于,所述通过将所述总阶数进行比较,以获取所述公共阶数因子,包括:5. The method of claim 4, wherein the step of comparing the total orders to obtain the common order factor includes: 根据所述总阶数的最小值确定所述公共阶数因子。The common order factor is determined based on the minimum value of the total order. 6.根据权利要求1所述的方法,其特征在于,所述当前运算单元包括乘法器和加法器;所述对每一组待处理数据中的尾数与提取所述公共阶数因子后的剩余阶数相乘,包括:6. The method according to claim 1, characterized in that the current operation unit includes a multiplier and an adder; the mantissa in each group of data to be processed and the remainder after extracting the common order factor Multiplication of orders, including: 通过所述乘法器对每一组待处理数据中的尾数与提取所述公共阶数因子后的剩余阶数相乘;Use the multiplier to multiply the mantissa in each group of data to be processed by the remaining order after extracting the common order factor; 所述对各组待处理数据对应的待累加数据进行累加,并将累加结果与所述公共阶数因子相乘,得到所述当前运算单元的输出数据,包括:The accumulated data corresponding to each group of data to be processed is accumulated, and the accumulated result is multiplied by the common order factor to obtain the output data of the current operation unit, including: 通过所述加法器对各组待处理数据对应的待累加数据进行累加,并通过所述乘法器将累加结果与所述公共阶数因子相乘,得到所述当前运算单元的输出数据。The adder is used to accumulate the data to be accumulated corresponding to each group of data to be processed, and the multiplier is used to multiply the accumulation result by the common order factor to obtain the output data of the current operation unit. 7.根据权利要求6所述的方法,其特征在于,所述当前运算单元还包括进位保留加法器,所述通过所述加法器对各组待处理数据对应的待累加数据进行累加,包括:7. The method according to claim 6, wherein the current operation unit further includes a carry-preserving adder, and the adder is used to accumulate data to be accumulated corresponding to each group of data to be processed, including: 当所述待累加数据的项数小于或等于预设项数时,通过加法器将所述待累加数据相加,以获取所述累加结果;When the number of items of the data to be accumulated is less than or equal to the preset number of items, the data to be accumulated are added by an adder to obtain the accumulation result; 当所述待累加数据的项数大于预设项数时,结合进位保留加法器和加法器将所述待累加数据相加,以获取所述累加结果。When the number of items of the data to be accumulated is greater than the preset number of items, the data to be accumulated is added using a carry-preserving adder and an adder to obtain the accumulation result. 8.一种神经网络的数据处理装置,其特征在于,包括:8. A neural network data processing device, characterized in that it includes: 待处理数据获取模块,被配置为获取输入至神经网络中的当前运算单元的多组待处理数据,所述待处理数据包括阶数和尾数;The data to be processed acquisition module is configured to acquire multiple sets of data to be processed input to the current computing unit in the neural network, where the data to be processed includes an order and a mantissa; 公共阶数因子提取模块,被配置为提取所述多组待处理数据的公共阶数因子,并对每一组待处理数据中的尾数与提取所述公共阶数因子后的剩余阶数相乘,得到每一组待处理数据对应的待累加数据;A common order factor extraction module configured to extract the common order factors of the multiple groups of data to be processed, and multiply the mantissa in each group of data to be processed by the remaining order after extracting the common order factor. , obtain the data to be accumulated corresponding to each group of data to be processed; 输出数据获取模块,被配置为对各组待处理数据对应的待累加数据进行累加,并将累加结果与所述公共阶数因子相乘,得到所述当前运算单元的输出数据。The output data acquisition module is configured to accumulate data to be accumulated corresponding to each group of data to be processed, and multiply the accumulation result by the common order factor to obtain the output data of the current operation unit. 9.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7任一项所述的方法。9. A computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the method of any one of claims 1 to 7 is implemented. 10.一种电子设备,其特征在于,包括:10. An electronic device, characterized in that it includes: 处理器;processor; 存储器,用于存储所述处理器的可执行指令;memory for storing executable instructions for the processor; 其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至7任一项所述的方法。wherein the processor is configured to perform the method of any one of claims 1 to 7 via execution of the executable instructions.
CN202210642072.0A 2022-06-08 2022-06-08 Data processing methods, devices, storage media and electronic equipment for neural networks Pending CN117251134A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210642072.0A CN117251134A (en) 2022-06-08 2022-06-08 Data processing methods, devices, storage media and electronic equipment for neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210642072.0A CN117251134A (en) 2022-06-08 2022-06-08 Data processing methods, devices, storage media and electronic equipment for neural networks

Publications (1)

Publication Number Publication Date
CN117251134A true CN117251134A (en) 2023-12-19

Family

ID=89135530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210642072.0A Pending CN117251134A (en) 2022-06-08 2022-06-08 Data processing methods, devices, storage media and electronic equipment for neural networks

Country Status (1)

Country Link
CN (1) CN117251134A (en)

Similar Documents

Publication Publication Date Title
CN112230881B (en) Floating point processor
CN109284130B (en) Neural network operation device and method
TWI763079B (en) Multiplier and method for floating-point arithmetic, integrated circuit chip, and computing device
CN105844330A (en) Data processing method of neural network processor and neural network processor
CN102339217B (en) Fusion processing device and method for floating-point number multiplication-addition device
CN113076083B (en) Data multiply-add operation circuit
JP7320582B2 (en) Neural network product-sum calculation method and apparatus
WO2024120249A1 (en) Data processing method and apparatus, device, and storage medium
TW202109281A (en) Signed multiword multiplier
CN112148249B (en) Dot product operation realization method and device, electronic equipment and storage medium
CN117251134A (en) Data processing methods, devices, storage media and electronic equipment for neural networks
CN117648959A (en) Multi-precision operand operation device supporting neural network operation
CN118151885A (en) Model conversion method, electronic device and storage medium
CN117420982A (en) Chip comprising a fused multiply-accumulator, device and control method for data operations
CN115237372A (en) Multiplication circuit, machine learning operation circuit, chip and data processing method
TW202333041A (en) System and method performing floating-point operations
CN111930670B (en) Heterogeneous intelligent processing quantization device, quantization method, electronic device and storage medium
CN103365826A (en) Small-area radical-3 FFT (Fast Fourier Transform) butterfly-shaped unit
US20090319589A1 (en) Using fractional exponents to reduce the computational complexity of numerical operations
CN113591031A (en) Low-power-consumption matrix operation method and device
CN111930674A (en) Multiply-accumulate operation device and method, heterogeneous intelligent processor and electronic equipment
WO2019127480A1 (en) Method for processing numerical value data, device, and computer readable storage medium
CN118034642B (en) Floating point operation method, device, electronic equipment and storage medium
CN118735807A (en) A non-local mean denoising device and method based on FPGA
CN117492692A (en) Floating point calculation methods and in-memory computing architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination