CN115480731A

CN115480731A - Operation method, device, chip, equipment and medium

Info

Publication number: CN115480731A
Application number: CN202211067400.5A
Authority: CN
Inventors: 李慧敏; 王京; 漆维; 欧阳剑
Original assignee: Kunlun Core Beijing Technology Co ltd
Current assignee: Kunlun Core Beijing Technology Co ltd
Priority date: 2022-09-01
Filing date: 2022-09-01
Publication date: 2022-12-16

Abstract

The present disclosure provides an operation method executed by an operation device, a chip, a device and a medium, and relates to the technical field of computers, in particular to the technical field of chips. The implementation scheme is as follows: in response to determining that the target operation type is a combination of arithmetic logic operation and table lookup operation, obtaining a first number of values to be input; respectively inputting a first number of values to be input into a first number of arithmetic logic units to obtain arithmetic logic operation results output by the first number of arithmetic logic units; respectively inputting the first number of arithmetic logic operation results into the first number of table look-up units to obtain table look-up operation results output by the first number of table look-up units; in response to determining that the target operation type is a non-arithmetic logic operation, obtaining a second number of values to be input; and inputting the second number of values to be input into the second number of non-arithmetic logic units respectively to obtain the non-arithmetic logic operation results output by the second number of non-arithmetic logic units respectively.

Description

Operation method, device, chip, equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for computing performed by a computing apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the development of artificial intelligence technology, more and more applications have achieved effects far exceeding those of traditional algorithms based on artificial intelligence technology. Deep learning is a data intensive algorithm and a calculation intensive algorithm, and is also an algorithm for rapid iterative development. In the deep learning algorithm, in order to improve the capability of the neural network model to process complex tasks, the method needs to be applied to various operation types.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The present disclosure provides an arithmetic method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which are executed by an arithmetic apparatus.

According to an aspect of the present disclosure, there is provided an operation method performed by an operation device including a first number of first operation units each including an arithmetic logic unit and a table look-up unit, and a second number of non-arithmetic logic units, the method including: in response to determining that the target operation type is a combination of arithmetic logic operation and table lookup operation, obtaining a first number of values to be input; inputting the first number of values to be input into the first number of arithmetic logic units respectively to obtain arithmetic logic operation results output by the first number of arithmetic logic units respectively; inputting the first number of arithmetic logic operation results into the first number of table look-up units respectively to obtain table look-up operation results output by the first number of table look-up units respectively; and in response to determining that the target operation type is a non-arithmetic logic operation, obtaining a second number of values to be input; and inputting the second number of values to be input into the second number of non-arithmetic logic units respectively to obtain the non-arithmetic logic operation results output by the second number of non-arithmetic logic units respectively.

According to another aspect of the present disclosure, there is provided an arithmetic device including: a first number of first arithmetic units, each first arithmetic unit of the first number of first arithmetic units comprising an arithmetic logic unit and a table look-up unit; a second number of non-arithmetic logic units; an acquisition unit configured to acquire a first number of values to be input in response to determining that a target operation type is a combination of an arithmetic logic operation and a table lookup operation; and an input unit configured to input the first number of values to be input into the first number of arithmetic logic units, respectively, to obtain arithmetic logic operation results output by the first number of arithmetic logic units, respectively; and inputting the first number of arithmetic logic operation results into the first number of table lookup units respectively to obtain table lookup operation results output by the first number of table lookup units respectively, the obtaining unit is further configured to obtain a second number of values to be input in response to determining that the target operation type is a non-arithmetic logic operation, and the input unit is further configured to input the second number of values to be input into the second number of non-arithmetic logic units respectively to obtain non-arithmetic logic operation results output by the second number of non-arithmetic logic units respectively.

According to another aspect of the present disclosure, there is provided a chip comprising the arithmetic device as described above.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described method of operation.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the above-described operation method.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program is capable of implementing the above-mentioned operational method when executed by a processor.

According to one or more embodiments of the disclosure, multiple operation types can be supported, and the performance of the operation device is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 shows a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an exemplary embodiment of the present disclosure;

FIG. 2 shows a flow chart of a method of operation according to an exemplary embodiment of the present disclosure;

FIG. 3 shows a flowchart of a method of building a lookup table according to an exemplary embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a lookup table according to an example embodiment of the present disclosure;

FIG. 5 shows a flowchart of a portion of an example process of a method of operation according to an example embodiment of the present disclosure;

FIG. 6 is a diagram illustrating a table lookup operation process according to an exemplary embodiment of the present disclosure;

fig. 7 shows a block diagram of a computing device according to an exemplary embodiment of the present disclosure;

FIG. 8 shows a block diagram of a vector processor according to an example embodiment of the present disclosure;

FIG. 9 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing the particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the element may be one or a plurality of. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

In the related art, a plurality of arithmetic units of the same arithmetic type are generally integrated in an arithmetic device, so that the arithmetic units can be used to perform parallel arithmetic based on a plurality of input values, thereby improving the arithmetic efficiency. In this case, the arithmetic device can only support a specific operation type, when a plurality of different types of operations need to be performed on the original input data, for example, an arithmetic logic operation needs to be performed first and then a non-arithmetic logic operation needs to be performed, in the actual operation process, the original input data needs to be firstly transmitted to the device including the arithmetic logic operation unit, the output arithmetic logic operation result is written into the storage unit, and then the arithmetic logic operation result is read from the storage unit and transmitted to the device including the non-arithmetic logic operation unit, which is low in operation efficiency and needs to occupy more hardware resources.

In the deep learning algorithm, in order to improve the capability of the neural network model to process complex tasks, many nonlinear functions which need to be realized by table lookup operation are used, and particularly, a sigmoid function, a tanh function, a gelu function and the like are applied to an activation function layer of the neural network model.

Generally, in the neural network model, before data is transmitted into the activation function layer, a specific arithmetic logic operation, such as a biasing unit, a down-sampling unit, etc., is required to be performed by the arithmetic logic unit. In the related art, the arithmetic logic unit and the unit for performing the table look-up operation are independent operation paths. In this case, in the actual operation process, the original input data needs to be first transferred into the arithmetic logic unit, the output arithmetic logic operation result is written into the storage unit, and then the arithmetic logic operation result is read from the storage unit and written into the unit for performing the table lookup operation, which is inefficient in operation.

Based on this, the present disclosure provides an operation method performed by an operation device including a first number of first operation units configured by an arithmetic logic unit and a table look-up unit to support both an arithmetic logic operation and a table look-up operation, and also including a second number of non-arithmetic logic units to support a non-arithmetic logic operation, capable of selecting based on a target operation type, and acquiring a corresponding number of values to be input, thereby improving operation performance.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the computational methods to be performed.

In some embodiments, the server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user

operating client devices

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein, and is not intended to be limiting.

The user may use

client devices

101, 102, 103, 104, 105, and/or 106 to obtain the target operation type and the value to be input. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 can also run any of a variety of additional server applications and/or mid-tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In certain embodiments, the database used by the server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Fig. 2 shows a flowchart of an operation method 200 according to an exemplary embodiment of the present disclosure, the operation device including a first number of first operation units and a second number of non-arithmetic logic units, each of the first number of first operation units including an arithmetic logic unit and a table look-up unit. As shown in fig. 2, the method 200 includes:

step S210, in response to the fact that the target operation type is determined to be the combination of arithmetic logic operation and table look-up operation, obtaining a first number of values to be input;

step S220, inputting the first number of values to be input into the first number of arithmetic logic units, respectively, to obtain arithmetic logic operation results output by the first number of arithmetic logic units, respectively;

step S230, inputting the first number of arithmetic logic operation results into the first number of table lookup units, respectively, to obtain table lookup operation results output by the first number of table lookup units, respectively;

step S240, responding to the fact that the target operation type is determined to be non-arithmetic logic operation, and acquiring a second number of values to be input; and

and step S250, inputting the second number of values to be input into the second number of non-arithmetic logic units respectively, so as to obtain the non-arithmetic logic operation results output by the second number of non-arithmetic logic units respectively.

Therefore, the combination and non-arithmetic logic operation of arithmetic logic operation and table look-up operation can be simultaneously supported by utilizing the first number of first operation units and the second number of non-arithmetic logic units, and the corresponding units can be selected based on the target operation type in the operation process, so that the corresponding number of values to be input are obtained for calculation, and the operation performance is improved.

As described previously, the method 200 may be applied in a deep learning algorithm, for example, may be used to perform an inferential computation process of a neural network model. The neural network model is a completed model constructed or trained based on a deep learning method, and may be, for example, neural network models of various structures, such as a convolutional neural network model, a feedback neural network model, a feedforward neural network model, and the like. Also, the target neural network model may be a model for performing various deep learning-based tasks, and may include, for example, an image classification model, a target detection model, a visual language model, a text content understanding model, and the like.

In some examples, the neural network model is an image classification model based on a convolutional neural network, and the image classification model is configured to extract a feature map of an image to be classified by using the convolutional neural network, and perform image classification based on the feature map. In this case, the image classification model includes a convolution layer, an average pooling layer, a sampling layer, and an activation layer. The operation process of the sampling layer and the activation layer in the image classification model can be supported by using the above steps S210 to S230, and the division operation process of the average value pooling layer can be supported by using the above steps S240 to S250. In this case, the value to be input may be a pixel value in an image feature map extracted using the convolution layer or the average pooling layer.

In some examples, the first number and the second number may be configured manually according to actual operation requirements, and may be the same or different. For example, when the requirements for arithmetic logic operations and table lookup operations are high and the requirements for non-arithmetic logic operations are low, a larger number of first operation units and a smaller number of non-arithmetic logic units can be configured in the operation device, so that the utilization rate of hardware resources can be improved.

According to some embodiments, the non-arithmetic logic operation comprises at least one of: division, evolution, logarithm, and exponential. The arithmetic logic operation comprises at least one of: addition, subtraction, multiplication, bit logic, and shift operations. Therefore, multiple operation types can be supported, and the operation performance is improved.

It should be understood that the arithmetic logic unit, the table look-up unit and the non-arithmetic logic unit included in the operation device may be further divided according to operation types. For example, the first number of arithmetic logic units may include a third number of adders, a fourth number of multipliers, and a fifth number of shifters, so that the corresponding operation paths can be flexibly configured based on the target operation type to improve the operation performance.

In some application scenarios, there is no need to perform a series of arithmetic logic operations and table lookup operations on the values to be input. Based thereon, according to some embodiments, the target operation types further include arithmetic logic operations and table lookup operations, the method 200 further comprising: in response to determining that the target operation type is an arithmetic logic operation, obtaining a first number of values to be input; inputting the first number of values to be input into the first number of arithmetic logic units respectively to obtain the arithmetic logic operation results output by the first number of arithmetic logic units respectively; responding to the fact that the target operation type is determined to be table look-up operation, and obtaining a first number of values to be input; and respectively inputting the first number of values to be input into the first number of table look-up units to obtain table look-up operation results output by the first number of table look-up units. Therefore, the operation path aiming at the value to be input can be flexibly configured by utilizing the target operation type, and the operation flexibility is improved.

In the related art, the table lookup operation is usually implemented by combining a table lookup method with piecewise linear fitting. One implementation is to divide the function to be computed into a number of segments, each of which is fitted to a linear function of f (x) = k x + b. In this case, the slope and intercept of the fitted linear function (i.e., k and b for each segment) may be stored in a parameter table. And determining a corresponding segment based on the input value, and calculating an operation result corresponding to the input value based on the k and b parameters corresponding to the segment. In this implementation, the number of entries stored in the parameter table is large, and a large number of storage hardware resources need to be occupied.

Another way to implement this is to have two endpoints x corresponding to each segment in the function _k And x _k+1 Corresponding output value f (x) _k ) And f (x) _k+1 ) Stored in the parameter table. And determining a corresponding segment based on the input value, and calculating an operation result corresponding to the input value by utilizing a linear interpolation mode based on the output values of two endpoints corresponding to the segment. In this implementation, the number of entries to be stored in the parameter table is reduced, but two output values still need to be sequentially read from the parameter table in each table lookup operation process, and the data reading efficiency is low.

Based on this, the present disclosure provides a method for constructing a lookup table, which is applied to a lookup unit included in the above-mentioned operation device to improve the efficiency of lookup operation. Fig. 3 shows a flowchart of a method 300 of building a look-up table according to an exemplary embodiment of the present disclosure. As shown in fig. 3, the method 300 includes:

step S301, obtaining an initial lookup table, wherein the initial lookup table comprises a plurality of input values which are sequentially arranged and a plurality of output values which are in one-to-one correspondence with the plurality of input values;

step S302, aiming at each input value in the plurality of input values, determining the arrangement ordinal number of the input value in the sequence formed by the plurality of input values as the index value corresponding to the input value;

step S303, for each index value of a plurality of index values corresponding to the plurality of input values, in response to determining that the index value is even, storing an output value corresponding to the index value in a first sub lookup table, and in response to determining that the index value is odd, storing an output value corresponding to the index value in a second sub lookup table, wherein the first sub lookup table is stored in a first storage unit corresponding to a first read port, and the second sub lookup table is stored in a second storage unit corresponding to a second read port, and the first and second sub lookup tables are different;

and step S304, constructing the lookup table based on the first sub lookup table and the second sub lookup table.

In the related art, it is common to store the initial lookup table in a storage unit, such as a read-only memory, a flash memory chip, etc. The applicant has noticed that, in general, each memory cell comprises only one read port, that is, when two output values need to be read from the initial lookup table, two read operations need to be performed by using the read port, and the data reading efficiency is low. When each memory cell includes two read ports, although the data read bandwidth can be increased, the memory cell occupies more hardware resources.

Based on this, by using the method 300, the data in the initial lookup table can be stored in two sub lookup tables in a split manner, that is, the first sub lookup table and the second sub lookup table can be respectively stored in two storage units. In this case, each storage unit only needs to include one read port, and data can be read from the two sub lookup tables simultaneously by using the ports included in the two storage units, so that hardware resources can be saved while data reading efficiency is improved.

As described above, in one example, the function to be computed may be divided into a plurality of segments, and each segment may correspond to two endpoints x _k And x _k+1 Output value f (x) of _k ) And f (x) _k+1 ) Stored in an initial look-up table. In the searching process, the target segment can be determined based on the input value of the table searching unit, and then the output values of the two endpoints corresponding to the target segment are read for calculating the table searching operation result corresponding to the input value. It can be seen that the above two endpoints x for each segment correspond to _k And x _k+1 The respective corresponding index values are consecutive, and by using the method 300 described above, the output values of the endpoints corresponding to each segment can be stored in the first sub-lookup table and the second sub-lookup table, respectively. That is to say, two output values required to be read in each search process are necessarily stored in two different storage units, and the two output values can be simultaneously read by using the ports respectively included in the two storage units, so that the data reading efficiency is improved.

In one example, the difference between any two adjacent input values in the initial lookup table is a LUT _interval A minimum input value of the plurality of input values in the initial lookup table is a LUT _min Maximum input value is LUT _max . In this case, the number of the plurality of input values is n +1, and n satisfies the following number relationship

n＝(LUT _max -LUT _min )/LUT _interval

LUT for any one of the plurality of input values _n Based on the difference (LUT) of this input value and said minimum input value _n -LUT _min ) And LUT _interval I.e. determining the ordinal number of the input value in the sequence of the plurality of input values, i.e. determining the index value corresponding to the input value.

Fig. 4 shows a schematic diagram of a lookup table according to an exemplary embodiment of the present disclosure. In this example, x ₀ ＝LUT _min ，x _i+1 -x _i ＝LUT _interval (i is not more than n-1). Based on each input value x _i The corresponding index value i, i.e. the corresponding f (x) _i ) And storing the data into the first sub lookup table or the second sub lookup table to improve the data reading bandwidth.

Fig. 5 shows a flowchart of a part of an example process of an operation method according to an exemplary embodiment of the present disclosure. As shown in fig. 5, according to some embodiments, the table lookup unit includes a table lookup constructed by the method 300, and the step S230 of inputting the first number of arithmetic logic operation results into the first number of table lookup units to obtain the table lookup operation results output by each of the first number of table lookup units includes:

step S231, for each arithmetic logic operation result in the first number of arithmetic logic operation results, determining a first input value and a second input value which are closest to the arithmetic logic operation result from the input values respectively corresponding to the plurality of output values included in the lookup table, wherein the first output value and the second output value respectively corresponding to the first input value and the second input value are respectively stored in a first sub-lookup table and a second sub-lookup table included in the lookup table;

step S232, determining a first sub-index value and a second sub-index value based on a first index value and a second index value respectively corresponding to the first input value and the second input value, where the first sub-index value and the second sub-index value can represent arrangement numbers of the first output value and the second output value in the first sub-lookup table and the second sub-lookup table;

step S233, reading the first output value from the first sub lookup table by using the first reading port based on the first sub index value;

step S234, reading the second output value from the second sub lookup table by using the second read port based on the second sub index value; and

step S235, calculating a table lookup operation result corresponding to the arithmetic logic operation result based on the first input value, the second input value, the first output value, the second output value and the arithmetic logic operation result.

Thus, the first input value and the second input value closest to the arithmetic logic operation result of the input table lookup unit can be determined, and the table lookup operation result corresponding to the arithmetic logic operation result can be calculated based on the first output value and the second output value corresponding to the first input value and the second input value. Meanwhile, by reading the first output value and the second output value by using the lookup table constructed by the method 300, the storage positions of the first output value and the second output value in the sub lookup table can be easily and quickly determined by using the first index value and the second index value respectively corresponding to the first input value and the second input value, so that efficient data reading is realized, and the operation efficiency is improved.

Illustratively, the table lookup unit may be used to calculate a sigmoid function, a tanh function, a gelu function, a natural exponential function, and the like, without limitation.

According to some embodiments, the difference between any two adjacent input values in the initial lookup table is the same, and the determining the first input value and the second input value that are closest to the arithmetic logic operation result from the input values corresponding to the output values included in the lookup table comprises: acquiring a difference value between the minimum input value of the plurality of input values and any two adjacent input values; based on the minimum input value, the arithmetic logic operation result, and the difference value, a first input value and a second input value that are closest to the arithmetic logic operation result are determined. Therefore, the first input value and the second input value which are closest to the arithmetic logic operation result can be determined more simply, conveniently and quickly, and the operation efficiency is improved.

In one example, the difference between any two adjacent input values in the lookup table is a LUT _interval A minimum input value of the plurality of input values in the initial lookup table is a LUT _min . In this case, when the result of the arithmetic logic operation input to the LUT unit is x _target I.e. based on the difference (x) of said input value and said minimum input value _target -LUT _min ) And LUT _interval Performs a rounding operation to determine a first input value and a second input value that are closest to the result of the arithmetic logic operation. For example, when operating with a look-up table as shown in FIG. 4, when based on (x) _target -LUT _min ) And LUT _interval The rounding operation of the quotient of (1) to obtain a value of 3, it can be determined that the first input value and the second input value closest to the arithmetic logic operation result are x respectively ₃ And x ₄ 。

According to some embodiments, the calculating, in step S235, a table lookup operation result corresponding to the arithmetic logic operation result based on the first input value, the second input value, the first output value, the second output value, and the arithmetic logic operation result includes: determining weights corresponding to the first output value and the second output value respectively based on a first difference value of the first input value and the arithmetic logic operation result and a second difference value of the second input value and the arithmetic logic operation result; and performing weighted calculation on the first output value and the second output value based on the weight to obtain a table look-up operation result corresponding to the arithmetic logic operation result.

FIG. 6 is a diagram illustrating a table lookup operation process according to an exemplary embodiment of the present disclosure. In some examples, the target input value of the input lookup table unit is the result of the arithmetic logic operation output by the arithmetic logic unit, and in other examples, the target input value of the input lookup table unit is the value to be input.

Referring to fig. 6, after the target input value is obtained, the first input value and the second input value x closest to the target input value may be determined based thereon _k And x _k+1 Respectively corresponding first index value and second index value, and then a first sub-index value and a second sub-index value can be determined so as to respectively read f (x) from the first sub-lookup table and the second sub-lookup table _k ) And f (x) _k+1 )。

Specific embodiments of the table lookup operation will be further described below in conjunction with the lookup table shown in fig. 4.

Referring to fig. 4, a plurality of index values corresponding to the plurality of input values are encoded from 0, and the plurality of index values are represented by using a binary system. In this case, when two input values closest to the target input value are x, respectively _k And x _k+1 And k is an even number, then the first indexWhen the value is k and the second index value is (k + 1), the first output value f (x) _k ) Stored in the first sub-lookup table, and the second output value f (x) _k+1 ) Stored in the second sub-lookup table. By performing a shift operation on the first index value k, a first sub-index value and a second sub-index value (k > 1) are obtained. When k is odd, the first index value is (k + 1), the second index value is k, and the first output value f (x) _k+1 ) Stored in the first sub-lookup table, and the second output value f (x) _k ) Stored in the second sub-lookup table. By performing a shift operation and an addition operation on the second index value k, a first sub-index value [ (k > 1) +1 is obtained]And a second sub-index value (k > 1).

It can be seen that the second sub-index value (k > 1) can be obtained by performing a shift operation on the index value k corresponding to the smaller xk of the two input values closest to the target input value. By using the gate to make a decision based on the parity of k, we can sum (k > 1) and [ (k > 1) +1]Is determined as the first sub-index value. Based on the first sub-index value and the second sub-index value, f (x) can be read from the first sub-lookup table and the second sub-lookup table in parallel _k ) And f (x) _k+1 )。

In one example, calculating the table lookup operation result f (x) corresponding to the target input value x based on the first input value, the second input value, the first output value, the second output value and the target input value may be implemented by using the following linear interpolation formula:

according to another aspect of the present disclosure, an arithmetic device is provided. Fig. 7 shows a block diagram of a computing device 700 according to an exemplary embodiment of the present disclosure. As shown in fig. 7, the apparatus 700 includes:

a first number of first operation units 710, each of the first number of first operation units including an arithmetic logic unit 711 and a table look-up unit 712;

a second number of non-arithmetic logic units 720;

an obtaining unit 730 configured to obtain a first number of values to be input in response to determining that the target operation type is a combination of an arithmetic logic operation and a table look-up operation; and

an input unit 740 configured to input the first number of values to be input into the first number of arithmetic logic units, respectively, to obtain the arithmetic logic operation results output by the first number of arithmetic logic units, respectively; and inputting the first number of arithmetic logic operation results into the first number of table lookup units respectively to obtain the table lookup operation results output by the first number of table lookup units respectively,

the obtaining unit is further configured to obtain a second number of values to be input in response to determining that the target operation type is a non-arithmetic logic operation,

the input unit is further configured to input the second number of values to be input into the second number of non-arithmetic logic units, respectively, to obtain the non-arithmetic logic operation results output by the second number of non-arithmetic logic units, respectively.

The operations of the units 730-740 of the computing apparatus 700 are similar to the operations of the steps S210-S250 described above, and are not repeated herein.

According to some embodiments, the target operation type further includes an arithmetic logic operation and a table lookup operation, the obtaining unit 630 is further configured to obtain a first number of values to be input in response to determining that the target operation type is the arithmetic logic operation, the input unit 640 is further configured to input the first number of values to be input into the first number of arithmetic logic units respectively to obtain arithmetic logic operation results output by the first number of arithmetic logic units respectively, the obtaining unit 630 is further configured to obtain a first number of values to be input in response to determining that the target operation type is the table lookup operation, and the input unit 640 is further configured to input the first number of values to be input into the first number of table lookup units respectively to obtain table lookup operation results output by the first number of table lookup units respectively.

In some examples, the operation device 700 may constitute a vector operation unit in a vector processor to perform operations in parallel for a plurality of scalar data (i.e., a plurality of values to be input included in a set of input data) included in one vector data, so as to improve operation efficiency.

Fig. 8 shows a block diagram of a vector processor 800 according to an exemplary embodiment of the present disclosure.

As shown in fig. 8, the vector processor 800 includes an instruction storage unit 801, an instruction fetch unit 802, an instruction decode unit 803, a control unit 804, an input data storage unit 805, an arithmetic device 700, an arbitration selector 806, a distributor 807, and an output data storage unit 808.

The instruction storage unit 801 is configured to store an artificial instruction, the instruction fetch unit 802 is configured to read the artificial instruction from the instruction storage unit 801 and analyze the artificial instruction to obtain an operation parameter for controlling a vector processing process, and the control unit 803 is configured to generate a corresponding control signal based on the operation parameter, so as to control the arithmetic device 700, the arbitration selector 806, and the distributor 807 to perform vector processing.

In this example, the operation parameters may include, for example, a target operation type for input data, the first number and the second number, and the like, and the control signals may include, for example, an operation enable signal for each operation unit in the operation device, a read-write control signal and read-write address information for the input data storage unit 805, the arbitration selector 806, the distributor 807, and the output data storage unit 808, and the like. By using the operation enable signal for each operation unit in the operation device 700, the operation path for the input data can be flexibly configured to realize the operation of the input data according to the target operation type. Based on the above control signals, the arbitration selector 806 can read a plurality of values to be input included in the input data from a plurality of addresses of the input data storage unit 805, and further select a first number or a second number of values to be input from the values in each clock cycle, so that the arithmetic device 700 can perform an operation based on the first number or the second number of values to be input, and the distributor 807 can acquire a plurality of operation results corresponding to the plurality of values to be input, which are respectively written into a plurality of addresses of the output data storage unit 808. Therefore, parallel operation of a plurality of values to be input contained in a group of input data can be realized based on a simple manual instruction, the number of instructions is reduced, and the operation efficiency is improved.

In practical applications, the computation load of the arithmetic logic operation and the table lookup operation is generally higher than that of the non-arithmetic logic operation, that is, the computation load of the first operation unit 710 is higher than that of the non-arithmetic logic unit 720. Based on this, in some examples, the first number is greater than the second number, and the input data storage unit 805 has stored therein a first number of values to be input. In this case, in response to the target operation type being at least one of an arithmetic logic operation or a table lookup operation, the first number of values to be input may be directly transmitted to the first operation unit 710, and the arithmetic logic unit 711 and the table lookup unit 712 are further configured by using the operation enable signal, so as to implement a high-parallelism operation and improve the operation efficiency. In response to the type of the target operation being a non-arithmetic logic operation, the arbitration selector 806 reads the first number of values to be input, selects no more than the second number of values to be input from each clock cycle, and transmits the selected values to the non-arithmetic logic unit 720 for operation, and the distributor 807 distributes the corresponding operation results to implement the non-arithmetic logic operation in a parallel and serial combined manner, so as to improve the utilization rate of hardware resources.

According to some embodiments, the non-arithmetic logic operation comprises at least one of: division, evolution, logarithm, and exponential.

According to some embodiments, the arithmetic logic operation comprises at least one of: addition, subtraction, multiplication, bit logic, and shift operations.

According to some embodiments, the table lookup unit comprises a lookup table constructed using the method 300 as described above, and the table lookup unit further comprises: a first determining subunit configured to determine, for each arithmetic logic operation result of the first number of arithmetic logic operation results, a first input value and a second input value that are closest to the arithmetic logic operation result from among input values respectively corresponding to a plurality of output values included in the lookup table, wherein the first output value and the second output value respectively corresponding to the first input value and the second input value are stored in a first sub-lookup table and a second sub-lookup table included in the lookup table, respectively; a second determining subunit, configured to determine a first sub-index value and a second sub-index value based on a first index value and a second index value respectively corresponding to the first input value and the second input value, where the first sub-index value and the second sub-index value can represent arrangement numbers of the first output value and the second output value in the first sub-lookup table and the second sub-lookup table; a first read port configured to read the first output value from the first sub-lookup table based on the first sub-index value; a second read port configured to read the second output value from the second sub-lookup table based on the second sub-index value; and a calculation subunit configured to calculate a table lookup operation result corresponding to the arithmetic logic operation result based on the first input value, the second input value, the first output value, the second output value, and the arithmetic logic operation result.

According to some embodiments, the first determining subunit is configured to: obtaining a difference value between a minimum input value of the plurality of input values and any two adjacent input values; based on the minimum input value, the arithmetic logic operation result, and the difference value, a first input value and a second input value that are closest to the arithmetic logic operation result are determined.

According to some embodiments, the computing subunit is configured to: determining weights corresponding to the first output value and the second output value respectively based on a first difference value of the first input value and the arithmetic logic operation result and a second difference value of the second input value and the arithmetic logic operation result; and performing weighted calculation on the first output value and the second output value based on the weight to obtain a table look-up operation result corresponding to the arithmetic logic operation result.

According to another aspect of the present disclosure, there is also provided a chip including the arithmetic device as described above.

According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described operation method.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the above-described operation method.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program realizes the above operation method when being executed by a processor.

Referring to fig. 9, a block diagram of a structure of an electronic device 900 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906, an output unit 907, a storage unit 908, and a communication unit 909. The input unit 906 may be any type of device capable of inputting information to the device 900, and the input unit 906 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output unit 907 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Storage unit 908 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 909 allows the device 900 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as an arithmetic method. For example, in some embodiments, the methods of operation may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When loaded into RAM 903 and executed by computing unit 901, may perform one or more steps of the above-described operational methods. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the operational method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the Internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical aspects of the present disclosure can be achieved.

While embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely illustrative embodiments or examples and that the scope of the invention is not to be limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the present disclosure.

Claims

1. An arithmetic method performed by an arithmetic device comprising a first number of first arithmetic units and a second number of non-arithmetic logic units, each first arithmetic unit of the first number of first arithmetic units comprising an arithmetic logic unit and a table lookup unit, the method comprising:

in response to determining that the target operation type is a combination of an arithmetic logic operation and a table lookup operation,

acquiring a first number of values to be input;

inputting the first number of values to be input into the first number of arithmetic logic units respectively to obtain arithmetic logic operation results output by the first number of arithmetic logic units respectively; and

inputting the first number of arithmetic logic operation results into the first number of table look-up units respectively to obtain table look-up operation results output by the first number of table look-up units respectively; and

in response to determining that the target operation type is a non-arithmetic logic operation,

acquiring a second number of values to be input; and

and inputting the second number of values to be input into the second number of non-arithmetic logic units respectively to obtain the non-arithmetic logic operation results output by the second number of non-arithmetic logic units respectively.

2. The method of claim 1, the target operation types further comprising arithmetic logic operations and table lookup operations, the method further comprising:

in response to determining that the target operation type is an arithmetic logic operation,

acquiring a first number of values to be input; and

inputting the first number of values to be input into the first number of arithmetic logic units respectively to obtain the arithmetic logic operation results output by the first number of arithmetic logic units respectively; and

in response to determining that the target operation type is a table lookup operation,

acquiring a first number of values to be input; and

and respectively inputting the first number of values to be input into the first number of table look-up units to obtain table look-up operation results output by the first number of table look-up units.

3. The method of claim 1 or 2, wherein the non-arithmetic logic operation comprises at least one of:

division, evolution, logarithm, and exponential.

4. The method of any one of claims 1-3, wherein the arithmetic logic operation comprises at least one of:

addition, subtraction, multiplication, bit logic, and shift operations.

5. The method of any one of claims 1-4, wherein the table lookup unit comprises a lookup table constructed using:

acquiring an initial lookup table, wherein the initial lookup table comprises a plurality of input values which are sequentially arranged and a plurality of output values which are in one-to-one correspondence with the input values;

for each input value in the plurality of input values, determining an arrangement ordinal number of the input value in a sequence consisting of the plurality of input values as an index value corresponding to the input value;

for each of a plurality of index values corresponding to the plurality of input values,

in response to determining that the index value is an even number, storing an output value corresponding to the index value in a first sub lookup table; and

in response to determining that the index value is an odd number, storing an output value corresponding to the index value in a second sub-lookup table, wherein the first sub-lookup table is stored in a first storage unit corresponding to a first read port, the second sub-lookup table is stored in a second storage unit corresponding to a second read port, and the first and second sub-lookup tables are different; and

constructing the lookup table based on the first sub lookup table and the second sub lookup table,

and wherein the inputting the first number of arithmetic logic operation results into the first number of table lookup units to obtain the table lookup operation results output by each of the first number of table lookup units comprises:

for each arithmetic logic operation result of the first number of arithmetic logic operation results,

determining a first input value and a second input value which are closest to the arithmetic logic operation result from input values respectively corresponding to a plurality of output values included in the lookup table, wherein the first output value and the second output value respectively corresponding to the first input value and the second input value are respectively stored in a first sub lookup table and a second sub lookup table included in the lookup table;

determining a first sub-index value and a second sub-index value based on a first index value and a second index value respectively corresponding to the first input value and the second input value, wherein the first sub-index value and the second sub-index value can represent arrangement ordinal numbers of the first output value and the second output value in the first sub-lookup table and the second sub-lookup table;

reading the first output value from the first sub-lookup table using the first read port based on the first sub-index value;

reading the second output value from the second sub-lookup table using the second read port based on the second sub-index value; and

and calculating a table lookup operation result corresponding to the arithmetic logic operation result based on the first input value, the second input value, the first output value, the second output value and the arithmetic logic operation result.

6. The method of claim 5, wherein the difference between any two adjacent input values in the initial lookup table is the same, and the determining the first input value and the second input value that are closest to the arithmetic logic operation result from the input values corresponding to the output values included in the lookup table comprises:

acquiring a difference value between the minimum input value of the plurality of input values and any two adjacent input values;

based on the minimum input value, the arithmetic logic operation result, and the difference value, a first input value and a second input value that are closest to the arithmetic logic operation result are determined.

7. The method of claim 5 or 6, wherein said calculating a lookup table result corresponding to the arithmetic logic result based on the first input value, the second input value, the first output value, the second output value, and the arithmetic logic result comprises:

determining weights corresponding to the first output value and the second output value respectively based on a first difference value of the first input value and the arithmetic logic operation result and a second difference value of the second input value and the arithmetic logic operation result; and

and performing weighted calculation on the first output value and the second output value based on the weight to obtain a table look-up operation result corresponding to the arithmetic logic operation result.

8. An arithmetic device comprising:

a first number of first arithmetic units, each of the first number of first arithmetic units comprising an arithmetic logic unit and a table lookup unit;

a second number of non-arithmetic logic units;

an acquisition unit configured to acquire a first number of values to be input in response to determining that a target operation type is a combination of an arithmetic logic operation and a table look-up operation; and

an input unit configured to input the first number of values to be input into the first number of arithmetic logic units, respectively, to obtain arithmetic logic operation results output by the first number of arithmetic logic units, respectively; and inputting the first number of arithmetic logic operation results into the first number of table lookup units respectively to obtain the table lookup operation results output by the first number of table lookup units respectively,

9. The apparatus of claim 8, the target operation types further comprising arithmetic logic operations and table lookup operations,

the obtaining unit is further configured to obtain a first number of values to be input in response to determining that the target operation type is an arithmetic logic operation,

the input unit is further configured to input the first number of values to be input into the first number of arithmetic logic units, respectively, to obtain arithmetic logic operation results output by the first number of arithmetic logic units, respectively,

the obtaining unit is further configured to obtain a first number of values to be input in response to determining that the target operation type is a table lookup operation,

the input unit is further configured to input the first number of values to be input into the first number of table lookup units, respectively, so as to obtain table lookup operation results output by the first number of table lookup units, respectively.

10. The apparatus of claim 8 or 9, wherein the non-arithmetic logic operation comprises at least one of:

division, evolution, logarithm, and exponential.

11. The apparatus of any one of claims 8-10, wherein the arithmetic logic operation comprises at least one of:

addition, subtraction, multiplication, bit logic, and shift operations.

12. The apparatus of any one of claims 8-11, wherein the table lookup unit comprises a lookup table constructed using:

for each index value of a plurality of index values corresponding to the plurality of input values,

and wherein the table lookup unit further comprises:

a first determining subunit configured to determine, for each arithmetic logic operation result of the first number of arithmetic logic operation results, a first input value and a second input value that are closest to the arithmetic logic operation result from among input values respectively corresponding to a plurality of output values included in the lookup table, wherein the first output value and the second output value respectively corresponding to the first input value and the second input value are respectively stored in a first sub-lookup table and a second sub-lookup table included in the lookup table;

a second determining subunit, configured to determine a first sub-index value and a second sub-index value based on a first index value and a second index value respectively corresponding to the first input value and the second input value, where the first sub-index value and the second sub-index value can represent arrangement numbers of the first output value and the second output value in the first sub-lookup table and the second sub-lookup table;

a first read port configured to read the first output value from the first sub-lookup table based on the first sub-index value;

a second read port configured to read the second output value from the second sub-lookup table based on the second sub-index value; and

a calculation subunit configured to calculate a table lookup operation result corresponding to the arithmetic logic operation result based on the first input value, the second input value, the first output value, the second output value, and the arithmetic logic operation result.

13. The apparatus of claim 12, wherein the first determining subunit is configured to:

14. The apparatus of claim 12 or 13, wherein the computing subunit is configured to:

15. A chip comprising the apparatus of any one of claims 8-14.

16. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

17. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-7.

18. A computer program product comprising a computer program, wherein the computer program realizes the method according to any one of claims 1-7 when executed by a processor.