WO2023072226A1 - Multi-level lookup table circuit, function solving method and related device - Google Patents

Multi-level lookup table circuit, function solving method and related device Download PDF

Info

Publication number
WO2023072226A1
WO2023072226A1 PCT/CN2022/128135 CN2022128135W WO2023072226A1 WO 2023072226 A1 WO2023072226 A1 WO 2023072226A1 CN 2022128135 W CN2022128135 W CN 2022128135W WO 2023072226 A1 WO2023072226 A1 WO 2023072226A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
subset
objective function
lookup table
input sequence
Prior art date
Application number
PCT/CN2022/128135
Other languages
French (fr)
Chinese (zh)
Inventor
孟畅
钱炜慷
申小龙
倪磊滨
吴志航
吴威
赵俊峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023072226A1 publication Critical patent/WO2023072226A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular to a multi-level lookup table circuit, a function solving method and related equipment.
  • LUT lookup table
  • Embodiments of the present application provide a multi-level lookup table circuit, a function solving method and related equipment. All functions can be decomposed into approximate Boolean functions, so as to solve the output value of the function through the cascade of the first module and the second module. In addition, the delay and energy consumption of the circuit corresponding to the function can be reduced by cascading multiple modules.
  • the first aspect of the embodiment of the present application provides a multi-level lookup table circuit, which can be applied to optical modules, wireless, neural networks and other scenarios, and the circuit can be used in the above scenarios to solve the objective function based on multiple lookup tables Output values
  • the plurality of lookup tables include a first lookup table and a second lookup table, the first input sequence of the objective function includes a first subset and a second subset; the circuit includes a first module and a second module.
  • the first module is configured to determine the output value of the first function based on the first subset and the first lookup table, and the first function is a nested function in the objective function.
  • the second module is configured to determine the output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
  • the circuit can solve the output value of the objective function by means of multiple lookup tables.
  • the first module and the second module can determine outputs of different functions based on different look-up tables. That is, the first module can determine the output of the first function based on the first subset and the first lookup table, and the second module can determine the output of the objective function based on the second subset, the second lookup table, and the output of the first function.
  • Decomposing the objective function into a Boolean function improves the efficiency of solving the objective function and can obtain multi-level logical decomposition results.
  • the area, delay and energy consumption of the circuit corresponding to the objective function can also be reduced.
  • the above-mentioned circuit further includes a scrambling module; the scrambling module is configured to obtain the second input sequence of the objective function, and scramble the ordering of the second input sequence, Obtain the first input sequence, and decompose the first input sequence to obtain the first subset and the second subset; the shuffling module is also used to send the first subset to the first module, and send the second subset to the second module .
  • the second input sequence is reordered by the scrambling module, and the more times of reordering, the more likely the objective function is, and the more likely it is to approach the objective function, which in turn can make the subsequent
  • the approximate processing of the truth table realizes the expressive ability of the objective function.
  • the above circuit further includes a configuration module; the configuration module is configured to approximate the truth table corresponding to the objective function to obtain an approximated truth table, and decomposing the approximated truth table into a first lookup table and a second lookup table; the configuration module is further configured to send the first lookup table to the first module, and send the second lookup table to the second module.
  • the unit by configuring the unit to approximate the truth table corresponding to the objective function that does not meet the decomposition conditions, the truth table that does not meet the decomposition conditions can be approximated, and the truth table that meets the decomposition conditions can be obtained , and then the output value of the objective function can be solved based on multiple lookup tables.
  • f(x) is the objective function
  • F( ⁇ (B),A) is the objective function or the approximated objective function
  • B is the first subset
  • A is the second subset
  • ⁇ (B) is the first subset a function.
  • the area of the circuit can be saved by cascading the first module and the second module. Latency and energy consumption.
  • the above-mentioned objective function is a function that does not satisfy a decomposition condition
  • the decomposition condition is a decomposition condition of a Boolean function corresponding to a truth table.
  • the objective function that does not satisfy the decomposition condition can be approximately decomposed to obtain multiple lookup tables, and then the output value of the objective function can be obtained according to the multiple lookup tables, the first module and the second module.
  • the above decomposition conditions of the truth table include at least one of the following, the behavior of the truth table is the second subset, and the columns are the first subset: truth table All elements in the row of the truth table are 0; all elements in the row of the truth table are 1; the behavior of the truth table contains the eigenvectors of 0 and 1; the behavior of the truth table is a vector obtained by inverting the eigenvector bit by bit.
  • the second aspect of the embodiment of the present application provides a method for solving a function, the method is applied to a lookup table scenario, and the method includes: obtaining a first input sequence of an objective function, the first input sequence includes at least two subsets, and the at least two subsets include In the first subset and the second subset, the objective function is a function that does not satisfy the decomposition condition, and the decomposition condition is the decomposition condition of the truth table corresponding to the Boolean function; the first lookup table and the first lookup table of the objective function are determined based on the first input sequence and the decomposition condition.
  • the second lookup table the first lookup table is related to the first subset, the second lookup table is related to the second subset; the output value of the first function is determined based on the first subset and the first lookup table, and the first function is the target A nested function within a function; determining an output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
  • the first lookup table and the second lookup table of the truth table corresponding to the objective function that does not satisfy the decomposition condition can be determined based on the first input sequence and the decomposition condition, and then the objective function can be solved based on multiple lookup tables output value.
  • the method further includes: obtaining a second input sequence of the objective function; Sort to get the first input sequence.
  • the above steps further include: disturbing the sorting of the second input sequence to obtain a third input sequence; determining that the first error is smaller than the second error, and the first error is The error between the output value obtained based on the first input sequence and the actual output of the objective function, and the second error is the error between the output value obtained based on the third input sequence and the actual output.
  • the input sequence can be scrambled multiple times, and the first input sequence can be determined based on the error between the output value corresponding to different scrambling situations and the real output value, so that it can be determined based on the first input sequence
  • the first lookup table and the second lookup table can realize the solution of the objective function.
  • f(x) is the objective function
  • F( ⁇ (B),A) is the approximated objective function
  • B is the first subset
  • A is the second subset
  • ⁇ (B) is the first function
  • the area of the circuit can be saved by cascading the first module and the second module. Latency and energy consumption.
  • the above step: determining the first lookup table and the second lookup table of the objective function based on the decomposition conditions of the first input sequence and the Boolean function includes: based on the first The input sequence and decomposition conditions are used to approximate the truth table of the objective function to obtain an approximated truth table; the approximated truth table is decomposed to obtain a first lookup table and a second lookup table.
  • the approximate calculation is aimed at fault-tolerant applications, and this technique introduces a small error into the system in exchange for reductions in circuit area, delay, and power consumption.
  • Approximate LUT is a technology that combines LUT operation and approximate calculation, and compared with accurate LUT, the storage overhead is greatly reduced.
  • the approximate LUT can approximate functions with many input numbers, and at the same time, the circuit area is small, the power consumption is low, and the delay is low.
  • the above decomposition conditions of the truth table include at least one of the following, the behavior of the truth table is the second subset, and the columns are the first subset: each true All elements in the row of the value table are 0; all elements in the row of each truth table are 1; the behavior of each truth table contains eigenvectors of 0 and 1; the behavior eigenvector of each truth table is taken bit by bit The resulting vector.
  • the third aspect of the embodiment of the present application provides an electronic device, the electronic device is applied to a lookup table scenario, and the electronic device includes: an acquisition unit, configured to acquire the first input sequence of the objective function, the first input sequence includes at least two subsets , at least two subsets include the first subset and the second subset, the objective function is a function that does not satisfy the decomposition condition, and the decomposition condition is the decomposition condition of the Boolean function corresponding to the truth table; the first determination unit is used to base on the first input The sequence and the decomposition condition determine the first lookup table and the second lookup table of the objective function, the first lookup table is related to the first subset, and the second lookup table is related to the second subset; the second determination unit is used for based on the first The subset and the first lookup table determine the output value of the first function, and the first function is a nested function in the objective function; the third determining unit is used for output based on the second subset, the second lookup
  • the above-mentioned acquiring unit is further configured to acquire a second input sequence of the objective function; the electronic device further includes: a scrambling unit configured to scramble the second input sequence Sorting of sequences to obtain the first input sequence.
  • the above-mentioned scrambling unit is also used to scramble the sorting of the second input sequence to obtain the third input sequence; the scrambling unit is specifically used to determine the order of the second input sequence.
  • the first error is smaller than the second error, the first error is the error between the output value obtained based on the first input sequence and the actual output of the objective function, and the second error is the difference between the output value obtained based on the third input sequence and the actual output error.
  • f(x) is the objective function F( ⁇ (B),A) is the approximated objective function
  • B is the first subset
  • A is the second subset
  • ⁇ (B) is the first function
  • the above-mentioned first determination unit is specifically configured to perform approximate processing on the truth table of the objective function based on the first input sequence and decomposition conditions, to obtain the approximated Truth table; the first determination unit is specifically used to decompose the approximated truth table to obtain a first lookup table and a second lookup table.
  • the above decomposition conditions of the truth table include at least one of the following: all elements in each row of the truth table are 0; All elements in are 1; the behavior of each truth table contains eigenvectors of 0 and 1; the vector obtained by inverting the behavioral eigenvectors of each truth table bit by bit.
  • a fourth aspect of the present application provides an electronic device, and the electronic device executes the method in the foregoing second aspect or any possible implementation manner of the second aspect.
  • the fifth aspect of the present application provides an electronic device, including: a processor, the processor is coupled with a memory, and the memory is used to store programs or instructions, and when the programs or instructions are executed by the processor, the electronic device realizes the above-mentioned second aspect Or the method in any possible implementation of the second aspect.
  • the sixth aspect of the present application provides a computer-readable medium, on which computer programs or instructions are stored, and when the computer programs or instructions are run on the computer, the computer executes the aforementioned second aspect or any possible implementation of the second aspect methods in methods.
  • a seventh aspect of the present application provides a computer program product.
  • the computer program product When the computer program product is executed on a computer, the computer executes the method in the foregoing second aspect or any possible implementation manner of the second aspect.
  • the third, fifth, sixth, seventh aspects or the technical effects brought by any of the possible implementations may refer to the second aspect or the technical effects brought by the different possible implementations of the second aspect, here No longer.
  • the circuit can solve the output value of the objective function by means of multiple lookup tables.
  • the first module and the second module can determine outputs of different functions based on different look-up tables. That is, the first module can determine the output of the first function based on the first subset and the first lookup table, and the second module can determine the output of the objective function based on the second subset, the second lookup table, and the output of the first function.
  • Decomposing the objective function into a Boolean function improves the efficiency of solving the objective function and can obtain multi-level logical decomposition results.
  • the area, delay and energy consumption of the circuit corresponding to the objective function can also be reduced.
  • Figure 1 is an example diagram of the accurate decomposition of Boolean functions
  • FIG. 2 is a schematic structural diagram of a system architecture provided in an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a multi-level look-up table circuit provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a scrambling module provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a truth table approximation and decomposition process provided in the embodiment of the present application.
  • FIG. 6 is an example diagram of a solution to an optimization problem provided by an embodiment of the present application.
  • Fig. 7 is another schematic structural diagram of the multi-level look-up table circuit provided by the embodiment of the present application.
  • FIG. 8 is another example diagram of a solution to an optimization problem provided by an embodiment of the present application.
  • Figure 9 and Figure 10 are example diagrams of the effect of the multi-level look-up table circuit on the continuous function provided by the embodiment of the present application.
  • Figure 11 is an example diagram of the effect of the multi-level look-up table circuit on the discontinuous function provided by the embodiment of the present application.
  • Fig. 12 is a schematic diagram of the process of testing a multi-level look-up table circuit provided by the embodiment of the present application.
  • Fig. 13 is a schematic flow chart of the function solving method provided by the embodiment of the present application.
  • FIG. 14 is a schematic flow chart of the approximate processing method provided by the embodiment of the present application.
  • Fig. 15 is a comparison example diagram of a fifth output bit truth table before and after approximation provided by the embodiment of the present application.
  • Fig. 16 is a comparison example diagram before and after approximation of a fourth output bit truth table provided by the embodiment of the present application.
  • Fig. 17 is the curve of the accurate cosine function obtained based on the function solving method provided by the embodiment of the present application.
  • Fig. 18 is a comparison graph between the approximate cosine function and the exact cosine function obtained based on the function solving method provided by the embodiment of the present application;
  • FIG. 19 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 20 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Embodiments of the present application provide a multi-level lookup table circuit, a function solving method and related equipment. All functions can be decomposed into approximate Boolean functions, so as to solve the output value of the function through the cascade of the first module and the second module. In addition, the delay and energy consumption of the circuit corresponding to the function can be reduced by cascading multiple modules.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes X s and the intercept b as input, and the output of the operation unit can be:
  • W s is the weight of X s
  • b is the bias of the neuron unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function may be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • a truth table is a table representing all possible states between inputs and outputs of a logical event.
  • a table listing the true and false values of a propositional formula. Usually 1 means true and 0 means false.
  • LUT random access memory
  • RAM random access memory
  • 4-input LUTs are mostly used in field programmable logic gate arrays (field programmable gate arrays, FPGAs), so each LUT can be regarded as a 16 ⁇ 1 RAM with 4-bit address lines.
  • FPGAs field programmable gate arrays
  • the programmable logic device (programmable logic device, PLD)/FPGA development software will automatically calculate all possible results of the logic circuit, And write the result to RAM in advance. In this way, each time a signal is input for logical operation, it is equivalent to inputting an address for table lookup, finding out the content corresponding to the address, and then outputting it.
  • Theorem 1 Divide the input X of the Boolean function into sets A and B, the Boolean function f has a disjoint decomposition about the free set A and the constrained set B, if and only if the two-dimensional truth value with A as the row and B as the column All rows of a table fall into one of four categories:
  • Type 1 (Type1), all elements in the row are 0;
  • Type 2 (Type2), all elements in the row are 1;
  • Type 3 the behavior contains eigenvectors of 0 and 1;
  • Type 4 (Type4), the vector obtained by bit-by-bit inversion of the behavior feature vector.
  • embodiments of the present application provide a multi-level look-up table circuit, a function solving method and related equipment. All functions can be decomposed into approximate Boolean functions, so as to solve the output value of the function through the cascade of the first module and the second module. In addition, the area, delay and energy consumption of the circuit corresponding to the function can be reduced by cascading multiple modules.
  • FIG. 2 is a system architecture diagram of a circuit application provided by an embodiment of the present application.
  • the system architecture diagram shown in FIG. 2 includes a control unit 201 and a multi-level look-up table unit 202 connected to the control unit 201 .
  • control unit 201 reads the input of the function, and the control unit 201 looks up the table from the multi-level look-up table unit 202 to obtain the output value of the function.
  • system architecture may further include an arithmetic logic unit (arithmetic logic unit, ALU) 203 .
  • ALU arithmetic logic unit
  • the arithmetic logic unit 203 is connected with the control unit 201 .
  • the operation flow of the system can be adjusted as follows: the control unit 201 reads the input of the function, the control unit 201 obtains the feature information (such as derivative information, etc.) The information and the function input are operated to obtain the output of the function.
  • the feature information such as derivative information, etc.
  • system architectures with ALUs are generally suitable for decomposition of accurate LUTs.
  • System architectures that do not have an ALU are generally suitable for decomposition that approximates a LUT. Wherein, for the description of the exact LUT and the approximate LUT, reference may be made later, and details are not repeated here.
  • the system architecture provided in the embodiments of the present application can be applied to scenarios such as optical modules, wireless networks, and neural networks, and is not specifically limited here.
  • the system may also be referred to as an accelerated computing system.
  • multi-level look-up table unit also called a multi-level look-up table circuit
  • the multi-level lookup table link provided in the embodiment of the present application can be applied to multiple-input-single-output scenarios, and can also be applied to multiple-input-multiple-output scenarios, which are described below:
  • the multi-level look-up table circuit As an example of a two-level look-up circuit, the multi-level look-up table circuit provided in the embodiment of the present application may be shown in FIG. 3 .
  • the multi-level lookup table circuit is used to solve the output value of the objective function based on multiple lookup tables, the multiple lookup tables include a first lookup table and a second lookup table, and the first input sequence of the objective function includes a first subset and a second lookup table Two subsets.
  • the first subset can be understood as a subsequence in the first input sequence, and the second subset can be understood as another subsequence in the first input sequence.
  • the multi-level look-up table circuit includes a first module 301 and a second module 302 .
  • the first module 301 is configured to determine the output value of the first function based on the first subset and the first lookup table, and the first function is a nested function in the objective function;
  • the second module 302 is configured to determine the output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
  • the first input sequence: X' ⁇ x' n ,...,x′ 1 ⁇
  • the first subset: B x’ b ,...,x′ 1
  • the second subset: A x ' n ,...,x' b+1
  • ⁇ (B) is the first function.
  • the number of inputs to the first lookup table is b
  • the number of inputs to the second lookup table is n ⁇ b+1.
  • the multi-level look-up table circuit can reduce the area, delay and energy consumption of the circuit corresponding to the objective function through the cascade connection of the first module and the second module.
  • the first module and the second module can realize the decomposition of approximate Boolean functions.
  • the multi-level lookup table circuit may also include a scrambling module 303, which is used to obtain the second input sequence of the objective function, scramble the sorting of the second input sequence, obtain the first input sequence, and decompose the first input sequence Obtain the first subset and the second subset.
  • the shuffling module 303 is further configured to send the first subset to the first module 301 and send the second subset to the second module 302 .
  • the first b ones are input into the first module 301 as the first subset B, and the nb ones are input into the second module 302 as the second subset A.
  • the scrambling module 303 can be realized by using n n-to-1 multiplexers (multiplexers, MUX), and the jth MUX is controlled by the signal S j to determine the specific scrambling situation.
  • the objective function in this embodiment of the present application may be a continuous function and a part of discontinuous functions that can be decomposed by Boolean functions. It may also be a function that does not satisfy the decomposition condition, which is the decomposition condition of the Boolean function corresponding to the truth table. Wherein, the decomposition condition of the truth table is as described in the above-mentioned Theorem 1, and details are not repeated here.
  • the multi-level lookup table circuit may also include a configuration module 304 for approximating the truth table corresponding to the objective function to obtain an approximated truth table, And the approximated truth table is decomposed into a first lookup table and a second lookup table.
  • the configuration module 304 is further configured to send the first lookup table to the first module 301 and send the second lookup table to the second module 302 .
  • the configuration module 304 can also be understood as being used to configure the first lookup table and the second lookup table, write the input sorting information to the scrambling module 303 according to a certain sequence, and write the sorting information to the first module and the second lookup table.
  • the second module writes data approximating the LUT.
  • the above-mentioned scrambling module 303 can also be used to scramble the second input sequence multiple times to obtain different scrambled input sequences, and after output values obtained based on different scrambled input sequences, compare each output value The error between the actual output of the objective function and the minimum error is determined to be the first input sequence corresponding to the input sequence.
  • the two-dimensional truth table in (b) in Figure 5 satisfies Theorem 1, so that the original objective function can be decomposed into two functions F and ⁇ approximately.
  • the decomposed first lookup table LUT0 and second lookup table LUT1 are shown in FIG. 5 .
  • the approximation processing in this embodiment can be understood as an optimization problem, the optimization goal is to minimize the error rate, and the variables to be solved are the feature vector V and the category vector T.
  • a set of driven feature vectors V and class vectors T corresponds to a unique approximate truth table.
  • the truth table may be a multidimensional truth table, and here only a two-dimensional truth table is used as an example for description.
  • Step 1 Randomly initialize the feature vector V.
  • Step 2 Alternately optimize the category vector T and feature vector V.
  • This step 2 may include the following 3 sub-steps:
  • Step 2.1 fix the feature vector V and change the category vector T to minimize the error rate
  • Step 2.2 fix the category vector T and change the feature vector V to minimize the error rate
  • Step 2.3 if neither V nor T changes, go to step 3, otherwise go to step 2.1.
  • Step 3 determine an approximate two-dimensional truth table according to the feature vector V and the category vector T.
  • the multi-level look-up table circuit provided in the embodiment of the present application may be shown in FIG. 7 .
  • the multi-level look-up table circuit includes m modules (the first module, . . . , the mth module). Wherein, each module is equivalent to the aforementioned Fig. 3 . Each module can be understood as the multi-level look-up table circuit in FIG. 3 .
  • the number of multi-level look-up table circuits corresponds to the number of outputs.
  • the approximation processing in this embodiment can be understood as an optimization problem, the optimization goal is to minimize the normalized mean of error distance (NMED), and the variables to be solved are the feature vector V and the category vector T.
  • NMED normalized mean of error distance
  • the feature vector V and the category vector T are the same as the feature vector V and the category vector T in the aforementioned multiple-input-single-output scenario, which will not be repeated here.
  • the objective function has I input and O output, p i is the probability of the i-th group of input combinations appearing, and g i are respectively the output values of all output bits of the approximated function (also called approximate function) and target function (also called accurate function) according to the binary concatenation under the i-th input combination.
  • the optimization process can be decomposed into sequentially optimizing the objective function on each output bit from the output of the objective function according to the highest bit to the lowest bit of the binary system.
  • the approximate function of other bits except the kth bit is fixed, and the optimal scrambling method on the kth output bit is determined, as well as the optimal feature vector V and category vector T, The NMED between the approximated function and the objective function is minimized.
  • the above multi-output optimization problem may be as shown in FIG. 8 .
  • the truth table of the middle bit and the lowest bit is fixed.
  • the truth table of the highest bit and the lowest bit is fixed.
  • the truth table of the highest bit and the middle bit is fixed.
  • the circuit can solve the output value of the objective function by means of multiple lookup tables.
  • the first module and the second module can determine outputs of different functions based on different look-up tables. That is, the first module can determine the output of the first function based on the first subset and the first lookup table, and the second module can determine the output of the objective function based on the second subset, the second lookup table, and the output of the first function.
  • Decomposing the objective function into a Boolean function improves the efficiency of multi-level logic function decomposition, and can obtain multi-level logic decomposition results.
  • the area, delay and energy consumption of the circuit corresponding to the objective function can also be reduced.
  • the truth table corresponding to the function that does not meet the Boolean function decomposition conditions after decomposing the approximate truth table, it can be applied to the decomposition of all functions, and then the objective function can be realized through logic circuits. solve. Or understand that by introducing small errors into the system, it is exchanged for the reduction of circuit area, delay and power consumption. Approximate processing reduces storage overhead compared to exact LUTs. Approximate processing can approximate functions with many input numbers, and at the same time, the area of the circuit is small, the power consumption is low, and the delay is low.
  • the following describes the performance of the two-level look-up table circuit and the prior art applied to continuous functions and discontinuous functions respectively.
  • the prior art includes a LUT scheme (Round) in which the lowest bit is rounded and an approximate LUT (ApproxLUT) in which a derivative is stored.
  • the selection of the continuous function can be shown in Table 1. Quantize the input sequence and output value of the continuous function to 16 bits, where the size of the constraint set is 9, and the number of rows corresponding to the first-level LUT is 512. The size of the free set is 7, and the number of rows of the corresponding second-level LUT is 256. This configuration approximates the decomposition of the continuous function.
  • a decomposition-based approximate lookup table architecture is a multi-level lookup table circuit provided by the embodiment of the present application.
  • DALTA decomposition-based approximate lookup table architecture
  • the multi-level look-up table circuit provided by this embodiment reduces the area (Area) by 97.5% on the premise of reducing the error, 40.7 % delay (Latency) and 99% energy consumption (Energy).
  • DALTA reduces the delay by 92.4% and the energy consumption by 56.5% compared with ApproxLUT.
  • the selection of the discontinuous function can be shown in Table 2.
  • the input sequence of the discontinuous function is quantized to 16 bits, wherein the size of the constraint set is 9, and the number of rows of the corresponding first-level LUT is 512.
  • the size of the free set is 7, and the number of rows of the corresponding second-level LUT is 256. With this configuration, the discontinuous function is approximately decomposed.
  • the effect of the multi-level look-up table circuit provided by the embodiment of the present application is further described by taking the cosine function as an example as the objective function.
  • the process can be shown in FIG. 12 , the fixed output quantity (binary number of bits) m takes a fixed value of 16, and the input quantity (binary number of bits) n increases from 8 to 16. And determine the size of the free set and the size of the constraint set as n/2 and then round up (round up or down), and then generate a multi-level lookup table circuit according to the parameters m, n, the free set and the constraint set. And test the error, area, delay and power consumption of the multi-level look-up table circuit. The test results are shown in Table 4.
  • means: the improvement relative to the accurate LUT. It can be seen from Table 4 that as the number of inputs increases, the error NMED decreases gradually. Compared with the accurate LUT, the area and power consumption of this application decrease exponentially.
  • the embodiments of the present application provide a method for solving a function and related equipment. All functions can be decomposed into approximate Boolean functions to obtain at least two lookup tables, so that the output value of the function can be obtained by solving the at least two lookup tables.
  • the function solving method provided in the embodiment of the present application can be applied to scenarios such as optical modules, wireless networks, and neural networks that are suitable for solving functions of a lookup table, and the details are not limited here.
  • the method may be executed by an electronic device, or may be executed by a component of the electronic device (eg, a processor, a chip, or a chip system, etc.), which is not specifically limited here.
  • the method includes steps 1301 to 1304 , which are described below respectively.
  • Step 1301 obtain the first input sequence of the objective function.
  • the second input sequence of the objective function may also be obtained, and the sorting in the second input sequence is disturbed to obtain the first input sequence.
  • the second input sequence may also be scrambled to obtain the third input sequence, and it is determined that the first error is smaller than the second error.
  • the first error is an error between the output value of the objective function obtained based on the first input sequence and the actual output value of the objective function.
  • the second error is a simple error between the output value of the objective function obtained based on the third input sequence and the actual output value of the objective function.
  • the input sequence obtained by determining the optimal scrambling scheme is the first input sequence.
  • the first input sequence in the embodiment of the present application includes at least two subsets, the at least two subsets include the first subset and the second subset, the objective function is a function that does not satisfy the decomposition condition, and the decomposition condition is that the Boolean function corresponds to true
  • the decomposition criteria for the value table can be understood as the above-mentioned Theorem 1, and details will not be repeated here.
  • Step 1302 determine a first lookup table and a second lookup table of the objective function based on the first input sequence and decomposition conditions.
  • a first lookup table and a second lookup table of the objective function may be determined based on the first input sequence and decomposition conditions.
  • the first lookup table is related to the first subset
  • the second lookup table is related to the second subset.
  • This step may specifically include: performing approximate processing on the truth table of the objective function based on the first input sequence and decomposition conditions to obtain an approximated truth table, and the approximated truth table satisfies the decomposition conditions. Therefore, the approximated truth table can be decomposed to obtain the first lookup table and the second lookup table.
  • Step 1303 determine the output value of the first function based on the first subset and the first lookup table.
  • the output value of the first function may be determined based on the first subset and the first lookup table.
  • the first function can be understood as a nested function of the objective function.
  • Step 1304 determine the output value of the objective function based on the second subset, the second lookup table and the output value of the first function.
  • the output value of the objective function may be determined based on the second subset, the second lookup table, and the output value of the first function. Since the output value is obtained by decomposing multiple lookup tables through the approximated truth table, the output value can be understood as the approximate output value of the objective function, and can also be understood as the output value of the approximated objective function.
  • the expression of the objective function can be as follows:
  • f(x) is the objective function
  • F( ⁇ (B),A) is the approximated objective function
  • B is the first subset
  • A is the second subset
  • ⁇ (B) is the first function
  • At least two lookup tables can be obtained by decomposing an approximate Boolean function on an objective function that does not meet the decomposition conditions, so as to obtain the output value of the objective function (which can also be understood as an approximate value) through at least two lookup tables.
  • the objective function is the cosine function
  • the input sequence of the cosine value is 5 bits (x 5 , x 4 , x 3 , x 2 , x 1 )
  • the output value is 5 bits (y 5 , y 4 , y 3 , y 2 , y 1 ).
  • the approximation process can be shown in Figure 14, and the approximation process includes steps 1401 to 1404, which are described below:
  • Step 1401 decompose the highest bit (bit 5), and fix the remaining bits (bit 1-4).
  • Step 1402 decompose the 4th digit, and fix the remaining digits (1-3 digits and 5th digit).
  • the present embodiment uses the exact function of the 1st-3rd position to estimate the approximate function on the corresponding position.
  • the approximate function on the 5th position has been completed after the first step operation get.
  • this embodiment finds the optimal scrambling method on the 4th position, that is, the free set is ⁇ x 3 , x 4 , x 5 ⁇ and the constraint set is ⁇ x 1 , x 2 ⁇ , and in this scrambling method
  • this scrambling mode there is one position difference between the exact two-dimensional truth table and the approximate two-dimensional truth table at the fourth bit.
  • Step 1403 decompose the 3rd, 2nd, and 1st bits respectively.
  • Step 1404 repeat steps 1401 to 1403 until the preset condition is met.
  • the preset conditions in this method include that the value of NMED does not drop or the difference between the approximate output value of the function and the real output value is smaller than the preset threshold.
  • An embodiment of the electronic device in the embodiment of the present application includes:
  • the acquisition unit 1901 is configured to acquire a first input sequence of the objective function, the first input sequence includes at least two subsets, the at least two subsets include the first subset and the second subset, the objective function is a function that does not satisfy the decomposition condition,
  • the decomposition condition is the decomposition condition of the Boolean function corresponding to the truth table;
  • the first determination unit 1902 is configured to determine a first lookup table and a second lookup table of the objective function based on the first input sequence and decomposition conditions, the first lookup table is related to the first subset, and the second lookup table is related to the second subset relevant;
  • the second determining unit 1903 is configured to determine the output value of the first function based on the first subset and the first lookup table, and the first function is a nested function in the objective function;
  • the third determining unit 1904 is configured to determine the output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
  • the electronic device may further include a shuffling unit 1905, configured to shuffle the order of the second input sequence to obtain the first input sequence.
  • a shuffling unit 1905 configured to shuffle the order of the second input sequence to obtain the first input sequence.
  • each unit in the electronic device the operations performed by each unit in the electronic device are similar to those described in the foregoing embodiment shown in FIG. 13 to FIG. 18 , and will not be repeated here.
  • At least two lookup tables can be obtained by decomposing an approximate Boolean function on an objective function that does not meet the decomposition conditions, so that the second determination unit 1903 and the third determination unit 1904 can solve the objective function through at least two lookup tables Output value (also can be understood as an approximate value).
  • FIG. 20 it is a schematic structural diagram of another electronic device provided by the present application.
  • the electronic device may include a processor 2001 , a memory 2002 and a communication interface 2003 .
  • the processor 2001, the memory 2002 and the communication interface 2003 are interconnected through lines. Wherein, program instructions and data are stored in the memory 2002 .
  • the memory 2002 stores program instructions and data corresponding to the steps executed by the electronic device in the corresponding embodiments shown in FIGS. 13 to 18 .
  • the processor 2001 is configured to execute the steps performed by the electronic device shown in any one of the above embodiments shown in FIG. 13 to FIG. 18 .
  • the communication interface 2003 may be used for receiving and sending data, and for performing steps related to acquiring, sending, and receiving in any of the embodiments shown in FIGS. 13 to 18 .
  • the electronic device may include more or fewer components than those shown in FIG. 20 , which is only an example in the present application and not limited thereto.
  • the disclosed system, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be fully or partially realized by software, hardware, firmware or any combination thereof.
  • the integrated units When the integrated units are implemented using software, they may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

Abstract

A multi-level lookup table circuit, the circuit being applicable to scenarios such as optical modules, wireless, and neural networks. The circuit can be used in the described scenarios to solve for an output value of an objective function on the basis of multiple lookup tables, the multiple lookup tables comprise a first lookup table (LUT0) and a second lookup table (LUT1), and a first input sequence of the objective function comprising a first subset and a second subset. The circuit comprises a first module (301) and a second module (302). The first module (301) is used to determine an output value of a first function on the basis of the first subset and the first lookup table (LUT0), the first function being a nested function in the objective function. The second module (302) is used to determine an output value of the objective function on the basis of the second subset, the second lookup table (LUT1) and the output value of the first function. By means of a cascading connection between the first module (301) and the second module (302), the area, delay and energy consumption of the circuit corresponding to the objective function can be reduced.

Description

一种多级查找表电路、函数求解方法及相关设备A multi-level look-up table circuit, function solving method and related equipment
本申请要求于2021年11月1日提交中国专利局、申请号为202111283778.4、发明名称为“一种多级查找表电路、函数求解方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on November 1, 2021, with the application number 202111283778.4, and the title of the invention is "a multi-level lookup table circuit, function solving method and related equipment", the entire content of which Incorporated in this application by reference.
技术领域technical field
本申请实施例涉及计算机技术领域,尤其涉及一种多级查找表电路、函数求解方法及相关设备。The embodiments of the present application relate to the field of computer technology, and in particular to a multi-level lookup table circuit, a function solving method and related equipment.
背景技术Background technique
随着晶体管的尺寸缩小到纳米级别,如何进一步降低运算系统的功耗越来越受关注。用查找表(look up table,LUT)运算是一种流行的低功耗技术,该技术提前算好常用函数的结果,并存在LUT中,运算时通过查表得到结果。As the size of transistors shrinks to the nanometer level, how to further reduce the power consumption of computing systems has attracted more and more attention. Using a lookup table (LUT) to calculate is a popular low-power technology. This technology calculates the results of commonly used functions in advance and stores them in the LUT. The result is obtained by looking up the table during operation.
然而,LUT的行数随函数输入数量的增加而指数增长,导致采用该技术的电路面积较大、延时较高和功耗较高。However, the number of rows of LUTs grows exponentially with the number of function inputs, resulting in larger circuit area, higher latency, and higher power consumption using this technique.
因此,如何减少LUT对应电路的面积、延时以及功耗是亟待解决的技术问题。Therefore, how to reduce the area, delay and power consumption of the circuit corresponding to the LUT is an urgent technical problem to be solved.
发明内容Contents of the invention
本申请实施例提供了一种多级查找表电路、函数求解方法及相关设备。可以将所有函数进行近似布尔函数的分解,从而通过第一模块与第二模块的级联求解函数的输出值。另外,通过多个模块级联的方式可以降低函数对应电路的延时和能耗。Embodiments of the present application provide a multi-level lookup table circuit, a function solving method and related equipment. All functions can be decomposed into approximate Boolean functions, so as to solve the output value of the function through the cascade of the first module and the second module. In addition, the delay and energy consumption of the circuit corresponding to the function can be reduced by cascading multiple modules.
本申请实施例第一方面提供了一种多级查找表电路,该电路可以应用于光模块、无线、神经网络等场景,该电路可以在上述场景中用于基于多个查找表求解目标函数的输出值,多个查找表包括第一查找表与第二查找表,目标函数的第一输入序列包括第一子集与第二子集;电路包括第一模块与第二模块。第一模块,用于基于第一子集与第一查找表确定第一函数的输出值,第一函数为目标函数中的嵌套函数。第二模块,用于基于第二子集、第二查找表以及第一函数的输出值确定目标函数的输出值。The first aspect of the embodiment of the present application provides a multi-level lookup table circuit, which can be applied to optical modules, wireless, neural networks and other scenarios, and the circuit can be used in the above scenarios to solve the objective function based on multiple lookup tables Output values, the plurality of lookup tables include a first lookup table and a second lookup table, the first input sequence of the objective function includes a first subset and a second subset; the circuit includes a first module and a second module. The first module is configured to determine the output value of the first function based on the first subset and the first lookup table, and the first function is a nested function in the objective function. The second module is configured to determine the output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
本申请实施例中,电路可以通过多个查找表的方式求解目标函数的输出值。第一模块与第二模块可以基于不同的查找表确定不同函数的输出。即第一模块可以基于第一子集与第一查找表确定第一函数的输出,第二模块可以基于第二子集、第二查找表以及第一函数的输出确定出目标函数的输出。将目标函数进行布尔函数的分解,提高了目标函数的求解效率,能获得多级逻辑分解结果。另外,通过第一模块与第二模块的级联,还可以减少目标函数对应电路的面积、延时和能耗。In the embodiment of the present application, the circuit can solve the output value of the objective function by means of multiple lookup tables. The first module and the second module can determine outputs of different functions based on different look-up tables. That is, the first module can determine the output of the first function based on the first subset and the first lookup table, and the second module can determine the output of the objective function based on the second subset, the second lookup table, and the output of the first function. Decomposing the objective function into a Boolean function improves the efficiency of solving the objective function and can obtain multi-level logical decomposition results. In addition, through the cascade connection of the first module and the second module, the area, delay and energy consumption of the circuit corresponding to the objective function can also be reduced.
可选地,在第一方面的一种可能的实现方式中,上述的电路还包括打乱模块;打乱模块,用于获取目标函数的第二输入序列,打乱第二输入序列的排序,得到第一输入序列,并分解第一输入序列得到第一子集与第二子集;打乱模块,还用于向第一模块发送第一子集,以及向第二模块发送第二子集。Optionally, in a possible implementation manner of the first aspect, the above-mentioned circuit further includes a scrambling module; the scrambling module is configured to obtain the second input sequence of the objective function, and scramble the ordering of the second input sequence, Obtain the first input sequence, and decompose the first input sequence to obtain the first subset and the second subset; the shuffling module is also used to send the first subset to the first module, and send the second subset to the second module .
该种可能的实现方式中,通过打乱模块对第二输入序列进行重新排序,且重新排序的次数越多,目标函数的可能性越多,逼近目标函数的可能性越大,进而可以使得后续对真值表的近似处理实现目标函数的表达能力。In this possible implementation, the second input sequence is reordered by the scrambling module, and the more times of reordering, the more likely the objective function is, and the more likely it is to approach the objective function, which in turn can make the subsequent The approximate processing of the truth table realizes the expressive ability of the objective function.
可选地,在第一方面的一种可能的实现方式中,上述的电路还包括配置模块;配置模块,用于将目标函数对应的真值表进行近似处理,得到近似后的真值表,并将近似后的真值表分解为第一查找表与第二查找表;配置模块,还用于向第一模块发送第一查找表,以及向第二模块发送第二查找表。Optionally, in a possible implementation manner of the first aspect, the above circuit further includes a configuration module; the configuration module is configured to approximate the truth table corresponding to the objective function to obtain an approximated truth table, and decomposing the approximated truth table into a first lookup table and a second lookup table; the configuration module is further configured to send the first lookup table to the first module, and send the second lookup table to the second module.
该种可能的实现方式中,通过配置单元对不满足分解条件的目标函数对应的真值表进行近似处理,可以实现将不满足分解条件的真值表近似处理,得到满足分解条件的真值表,进而可以基于多个查找表求解目标函数的输出值。In this possible implementation, by configuring the unit to approximate the truth table corresponding to the objective function that does not meet the decomposition conditions, the truth table that does not meet the decomposition conditions can be approximated, and the truth table that meets the decomposition conditions can be obtained , and then the output value of the objective function can be solved based on multiple lookup tables.
可选地,在第一方面的一种可能的实现方式中,上述的目标函数的表达式如下:Optionally, in a possible implementation of the first aspect, the expression of the above objective function is as follows:
f(x)=F(Φ(B),A);f(x)=F(Φ(B),A);
其中,f(x)为目标函数,F(Φ(B),A)为目标函数或者是近似后的目标函数,B为第一子集,A为第二子集,Φ(B)为第一函数。Among them, f(x) is the objective function, F(Φ(B),A) is the objective function or the approximated objective function, B is the first subset, A is the second subset, and Φ(B) is the first subset a function.
该种可能的实现方式中,通过将目标函数分解为两个函数,并通过两个查找表确定目标函数的输出值,通过第一模块与第二模块级联的方式,可以节省电路的面积、延时以及能耗。In this possible implementation, by decomposing the objective function into two functions, and determining the output value of the objective function through two lookup tables, the area of the circuit can be saved by cascading the first module and the second module. Latency and energy consumption.
可选地,在第一方面的一种可能的实现方式中,上述的目标函数为不满足分解条件的函数,分解条件为布尔函数对应真值表的分解条件。Optionally, in a possible implementation manner of the first aspect, the above-mentioned objective function is a function that does not satisfy a decomposition condition, and the decomposition condition is a decomposition condition of a Boolean function corresponding to a truth table.
该种可能的实现方式中,对不满足分解条件的目标函数可以近似分解得到多个查找表,进而可以根据多个查找表、第一模块以及第二模块得到目标函数的输出值。In this possible implementation, the objective function that does not satisfy the decomposition condition can be approximately decomposed to obtain multiple lookup tables, and then the output value of the objective function can be obtained according to the multiple lookup tables, the first module and the second module.
可选地,在第一方面的一种可能的实现方式中,上述真值表的分解条件包括以下至少一项,真值表的行为第二子集,列为第一子集:真值表的行中所有元素为0;真值表的行中所有元素为1;真值表的行为包含0与1的特征向量;真值表的行为特征向量逐位取反得到的向量。Optionally, in a possible implementation of the first aspect, the above decomposition conditions of the truth table include at least one of the following, the behavior of the truth table is the second subset, and the columns are the first subset: truth table All elements in the row of the truth table are 0; all elements in the row of the truth table are 1; the behavior of the truth table contains the eigenvectors of 0 and 1; the behavior of the truth table is a vector obtained by inverting the eigenvector bit by bit.
本申请实施例第二方面提供了一种函数求解方法,该方法应用于查找表场景,方法包括:获取目标函数的第一输入序列,第一输入序列包括至少两个子集,至少两个子集包括第一子集与第二子集,目标函数为不满足分解条件的函数,分解条件为布尔函数对应真值表的分解条件;基于第一输入序列与分解条件确定目标函数的第一查找表与第二查找表,第一查找表与第一子集相关,第二查找表与第二子集相关;基于第一子集与第一查找表确定第一函数的输出值,第一函数为目标函数中的嵌套函数;基于第二子集、第二查找表以及第一函数的输出值确定目标函数的输出值。The second aspect of the embodiment of the present application provides a method for solving a function, the method is applied to a lookup table scenario, and the method includes: obtaining a first input sequence of an objective function, the first input sequence includes at least two subsets, and the at least two subsets include In the first subset and the second subset, the objective function is a function that does not satisfy the decomposition condition, and the decomposition condition is the decomposition condition of the truth table corresponding to the Boolean function; the first lookup table and the first lookup table of the objective function are determined based on the first input sequence and the decomposition condition. The second lookup table, the first lookup table is related to the first subset, the second lookup table is related to the second subset; the output value of the first function is determined based on the first subset and the first lookup table, and the first function is the target A nested function within a function; determining an output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
本申请实施例中,可以基于第一输入序列与分解条件确定对不满足分解条件的目标函数对应的真值表的第一查找表与第二查找表,进而可以基于多个查找表求解目标函数的输出值。In the embodiment of the present application, the first lookup table and the second lookup table of the truth table corresponding to the objective function that does not satisfy the decomposition condition can be determined based on the first input sequence and the decomposition condition, and then the objective function can be solved based on multiple lookup tables output value.
可选地,在第二方面的一种可能的实现方式中,上述步骤:获取目标函数的第一输入序列之前,方法还包括:获取目标函数的第二输入序列;打乱第二输入序列的排序,得到第一输入序列。Optionally, in a possible implementation of the second aspect, the above step: before obtaining the first input sequence of the objective function, the method further includes: obtaining a second input sequence of the objective function; Sort to get the first input sequence.
该种可能的实现方式中,通过对第二输入序列进行重新排序,可以使得后续对真值表的近似处理实现目标函数的表达能力。In this possible implementation manner, by reordering the second input sequence, the subsequent approximate processing of the truth table can realize the expressiveness of the objective function.
可选地,在第二方面的一种可能的实现方式中,上述步骤还包括:打乱第二输入序列的排序,得到第三输入序列;确定第一误差小于第二误差,第一误差为基于第一输入序列得到的输出值与目标函数的实际输出之间的误差,第二误差为基于第三输入序列得到的输出值与实际输出之间的误差。Optionally, in a possible implementation of the second aspect, the above steps further include: disturbing the sorting of the second input sequence to obtain a third input sequence; determining that the first error is smaller than the second error, and the first error is The error between the output value obtained based on the first input sequence and the actual output of the objective function, and the second error is the error between the output value obtained based on the third input sequence and the actual output.
该种可能的实现方式中,可以对输入序列进行多次打乱,在基于不同打乱情况对应的输出值与真实输出值之间误差确定第一输入序列,进而使得可以基于第一输入序列确定的第一查找表与第二查找表可以实现目标函数的求解。In this possible implementation, the input sequence can be scrambled multiple times, and the first input sequence can be determined based on the error between the output value corresponding to different scrambling situations and the real output value, so that it can be determined based on the first input sequence The first lookup table and the second lookup table can realize the solution of the objective function.
可选地,在第二方面的一种可能的实现方式中,上述的目标函数的表达式如下:Optionally, in a possible implementation of the second aspect, the expression of the above objective function is as follows:
f(x)≈F(Φ(B),A);f(x)≈F(Φ(B),A);
其中,f(x)为目标函数,F(Φ(B),A)为近似后的目标函数,B为第一子集,A为第二子集,Φ(B)为第一函数。Among them, f(x) is the objective function, F(Φ(B),A) is the approximated objective function, B is the first subset, A is the second subset, and Φ(B) is the first function.
该种可能的实现方式中,通过将目标函数分解为两个函数,并通过两个查找表确定目标函数的输出值,通过第一模块与第二模块级联的方式,可以节省电路的面积、延时以及能耗。In this possible implementation, by decomposing the objective function into two functions, and determining the output value of the objective function through two lookup tables, the area of the circuit can be saved by cascading the first module and the second module. Latency and energy consumption.
可选地,在第二方面的一种可能的实现方式中,上述步骤:基于第一输入序列与布尔函数的分解条件确定目标函数的第一查找表与第二查找表,包括:基于第一输入序列与分解条件对目标函数的真值表进行近似处理,得到近似后的真值表;分解近似后的真值表得到第一查找表与第二查找表。Optionally, in a possible implementation of the second aspect, the above step: determining the first lookup table and the second lookup table of the objective function based on the decomposition conditions of the first input sequence and the Boolean function includes: based on the first The input sequence and decomposition conditions are used to approximate the truth table of the objective function to obtain an approximated truth table; the approximated truth table is decomposed to obtain a first lookup table and a second lookup table.
该种可能的实现方式中,该近似计算针对可容错应用,该技术通过向系统中引入微小的误差,来换取电路的面积、延时和功耗的降低。近似LUT是一种结合LUT运算和近似计算的技术,与准确LUT相比,存储开销大幅降低。近似LUT可以近似地计算有很多输入数的函数,同时电路的面积小,功耗低,延时低。In this possible implementation manner, the approximate calculation is aimed at fault-tolerant applications, and this technique introduces a small error into the system in exchange for reductions in circuit area, delay, and power consumption. Approximate LUT is a technology that combines LUT operation and approximate calculation, and compared with accurate LUT, the storage overhead is greatly reduced. The approximate LUT can approximate functions with many input numbers, and at the same time, the circuit area is small, the power consumption is low, and the delay is low.
可选地,在第二方面的一种可能的实现方式中,上述真值表的分解条件包括以下至少一项,真值表的行为第二子集,列为第一子集:每个真值表的行中所有元素为0;每个真值表的行中所有元素为1;每个真值表的行为包含0与1的特征向量;每个真值表的行为特征向量逐位取反得到的向量。Optionally, in a possible implementation of the second aspect, the above decomposition conditions of the truth table include at least one of the following, the behavior of the truth table is the second subset, and the columns are the first subset: each true All elements in the row of the value table are 0; all elements in the row of each truth table are 1; the behavior of each truth table contains eigenvectors of 0 and 1; the behavior eigenvector of each truth table is taken bit by bit The resulting vector.
本申请实施例第三方面提供了一种电子设备,该电子设备应用于查找表场景,电子设备包括:获取单元,用于获取目标函数的第一输入序列,第一输入序列包括至少两个子集,至少两个子集包括第一子集与第二子集,目标函数为不满足分解条件的函数,分解条件为布尔函数对应真值表的分解条件;第一确定单元,用于基于第一输入序列与分解条件确定目标函数的第一查找表与第二查找表,第一查找表与第一子集相关,第二查找表与第二子集相关;第二确定单元,用于基于第一子集与第一查找表确定第一函数的输出值,第一函数为目标函数中的嵌套函数;第三确定单元,用于基于第二子集、第二查找表以及第一函数的输出值确定目标函数的输出值。The third aspect of the embodiment of the present application provides an electronic device, the electronic device is applied to a lookup table scenario, and the electronic device includes: an acquisition unit, configured to acquire the first input sequence of the objective function, the first input sequence includes at least two subsets , at least two subsets include the first subset and the second subset, the objective function is a function that does not satisfy the decomposition condition, and the decomposition condition is the decomposition condition of the Boolean function corresponding to the truth table; the first determination unit is used to base on the first input The sequence and the decomposition condition determine the first lookup table and the second lookup table of the objective function, the first lookup table is related to the first subset, and the second lookup table is related to the second subset; the second determination unit is used for based on the first The subset and the first lookup table determine the output value of the first function, and the first function is a nested function in the objective function; the third determining unit is used for output based on the second subset, the second lookup table, and the first function value determines the output value of the objective function.
可选地,在第三方面的一种可能的实现方式中,上述的获取单元,还用于获取目标函数的第二输入序列;电子设备还包括:打乱单元,用于打乱第二输入序列的排序,得到第一输入序列。Optionally, in a possible implementation manner of the third aspect, the above-mentioned acquiring unit is further configured to acquire a second input sequence of the objective function; the electronic device further includes: a scrambling unit configured to scramble the second input sequence Sorting of sequences to obtain the first input sequence.
可选地,在第三方面的一种可能的实现方式中,上述的打乱单元,还用于打乱第二输入序列的排序,得到第三输入序列;打乱单元,具体用于确定第一误差小于第二误差,第一误 差为基于第一输入序列得到的输出值与目标函数的实际输出之间的误差,第二误差为基于第三输入序列得到的输出值与实际输出之间的误差。Optionally, in a possible implementation of the third aspect, the above-mentioned scrambling unit is also used to scramble the sorting of the second input sequence to obtain the third input sequence; the scrambling unit is specifically used to determine the order of the second input sequence. The first error is smaller than the second error, the first error is the error between the output value obtained based on the first input sequence and the actual output of the objective function, and the second error is the difference between the output value obtained based on the third input sequence and the actual output error.
可选地,在第三方面的一种可能的实现方式中,上述的目标函数的表达式如下:Optionally, in a possible implementation of the third aspect, the expression of the above objective function is as follows:
f(x)≈F(Φ(B),A);f(x)≈F(Φ(B),A);
其中,f(x)为目标函数F(Φ(B),A)为近似后的目标函数,B为第一子集,A为第二子集,Φ(B)为第一函数。Wherein, f(x) is the objective function F(Φ(B),A) is the approximated objective function, B is the first subset, A is the second subset, and Φ(B) is the first function.
可选地,在第三方面的一种可能的实现方式中,上述的第一确定单元,具体用于基于第一输入序列与分解条件对目标函数的真值表进行近似处理,得到近似后的真值表;第一确定单元,具体用于分解近似后的真值表得到第一查找表与第二查找表。Optionally, in a possible implementation manner of the third aspect, the above-mentioned first determination unit is specifically configured to perform approximate processing on the truth table of the objective function based on the first input sequence and decomposition conditions, to obtain the approximated Truth table; the first determination unit is specifically used to decompose the approximated truth table to obtain a first lookup table and a second lookup table.
可选地,在第三方面的一种可能的实现方式中,上述真值表的分解条件包括以下至少一项:每个真值表的行中所有元素为0;每个真值表的行中所有元素为1;每个真值表的行为包含0与1的特征向量;每个真值表的行为特征向量逐位取反得到的向量。Optionally, in a possible implementation of the third aspect, the above decomposition conditions of the truth table include at least one of the following: all elements in each row of the truth table are 0; All elements in are 1; the behavior of each truth table contains eigenvectors of 0 and 1; the vector obtained by inverting the behavioral eigenvectors of each truth table bit by bit.
本申请第四方面提供了一种电子设备,该电子设备执行前述第二方面或第二方面的任意可能的实现方式中的方法。A fourth aspect of the present application provides an electronic device, and the electronic device executes the method in the foregoing second aspect or any possible implementation manner of the second aspect.
本申请第五方面提供了一种电子设备,包括:处理器,处理器与存储器耦合,存储器用于存储程序或指令,当程序或指令被处理器执行时,使得该电子设备实现上述第二方面或第二方面的任意可能的实现方式中的方法。The fifth aspect of the present application provides an electronic device, including: a processor, the processor is coupled with a memory, and the memory is used to store programs or instructions, and when the programs or instructions are executed by the processor, the electronic device realizes the above-mentioned second aspect Or the method in any possible implementation of the second aspect.
本申请第六方面提供了一种计算机可读介质,其上存储有计算机程序或指令,当计算机程序或指令在计算机上运行时,使得计算机执行前述第二方面或第二方面的任意可能的实现方式中的方法。The sixth aspect of the present application provides a computer-readable medium, on which computer programs or instructions are stored, and when the computer programs or instructions are run on the computer, the computer executes the aforementioned second aspect or any possible implementation of the second aspect methods in methods.
本申请第七方面提供了一种计算机程序产品,该计算机程序产品在计算机上执行时,使得计算机执行前述第二方面或第二方面的任意可能的实现方式中的方法。A seventh aspect of the present application provides a computer program product. When the computer program product is executed on a computer, the computer executes the method in the foregoing second aspect or any possible implementation manner of the second aspect.
其中,第三、第五、第六、第七方面或者其中任一种可能实现方式所带来的技术效果可参见第二方面或第二方面不同可能实现方式所带来的技术效果,此处不再赘述。Among them, the third, fifth, sixth, seventh aspects or the technical effects brought by any of the possible implementations may refer to the second aspect or the technical effects brought by the different possible implementations of the second aspect, here No longer.
从以上技术方案可以看出,本申请实施例具有以下优点:电路可以通过多个查找表的方式求解目标函数的输出值。第一模块与第二模块可以基于不同的查找表确定不同函数的输出。即第一模块可以基于第一子集与第一查找表确定第一函数的输出,第二模块可以基于第二子集、第二查找表以及第一函数的输出确定出目标函数的输出。将目标函数进行布尔函数的分解,提高了目标函数的求解效率,能获得多级逻辑分解结果。另外,通过第一模块与第二模块的级联,还可以减少目标函数对应电路的面积、延时和能耗。It can be seen from the above technical solutions that the embodiments of the present application have the following advantages: the circuit can solve the output value of the objective function by means of multiple lookup tables. The first module and the second module can determine outputs of different functions based on different look-up tables. That is, the first module can determine the output of the first function based on the first subset and the first lookup table, and the second module can determine the output of the objective function based on the second subset, the second lookup table, and the output of the first function. Decomposing the objective function into a Boolean function improves the efficiency of solving the objective function and can obtain multi-level logical decomposition results. In addition, through the cascade connection of the first module and the second module, the area, delay and energy consumption of the circuit corresponding to the objective function can also be reduced.
附图说明Description of drawings
图1为布尔函数准确分解的示例图;Figure 1 is an example diagram of the accurate decomposition of Boolean functions;
图2为本申请实施例提供的系统架构的结构示意图;FIG. 2 is a schematic structural diagram of a system architecture provided in an embodiment of the present application;
图3为本申请实施例提供的多级查找表电路的一种结构示意图;FIG. 3 is a schematic structural diagram of a multi-level look-up table circuit provided by an embodiment of the present application;
图4为本申请实施例提供的一种打乱模块的结构示意图;FIG. 4 is a schematic structural diagram of a scrambling module provided by an embodiment of the present application;
图5为本申请实施例提供的一种真值表近似并分解的过程示意图;FIG. 5 is a schematic diagram of a truth table approximation and decomposition process provided in the embodiment of the present application;
图6为本申请实施例提供的一种解优化问题的一种示例图;FIG. 6 is an example diagram of a solution to an optimization problem provided by an embodiment of the present application;
图7为本申请实施例提供的多级查找表电路的另一种结构示意图;Fig. 7 is another schematic structural diagram of the multi-level look-up table circuit provided by the embodiment of the present application;
图8为本申请实施例提供的一种解优化问题的另一种示例图;FIG. 8 is another example diagram of a solution to an optimization problem provided by an embodiment of the present application;
图9与图10为本申请实施例提供的多级查找表电路在连续函数上的效果示例图;Figure 9 and Figure 10 are example diagrams of the effect of the multi-level look-up table circuit on the continuous function provided by the embodiment of the present application;
图11为本申请实施例提供的多级查找表电路在非连续函数上的效果示例图;Figure 11 is an example diagram of the effect of the multi-level look-up table circuit on the discontinuous function provided by the embodiment of the present application;
图12为本申请实施例提供的测试多级查找表电路的过程示意图;Fig. 12 is a schematic diagram of the process of testing a multi-level look-up table circuit provided by the embodiment of the present application;
图13为本申请实施例提供的函数求解方法一个流程示意图;Fig. 13 is a schematic flow chart of the function solving method provided by the embodiment of the present application;
图14为本申请实施例提供的近似处理方法的一个流程示意图;FIG. 14 is a schematic flow chart of the approximate processing method provided by the embodiment of the present application;
图15为本申请实施例提供的一种第5输出位真值表在近似前后的对比示例图;Fig. 15 is a comparison example diagram of a fifth output bit truth table before and after approximation provided by the embodiment of the present application;
图16为本申请实施例提供的一种第4输出位真值表在近似前后的对比示例图;Fig. 16 is a comparison example diagram before and after approximation of a fourth output bit truth table provided by the embodiment of the present application;
图17为本申请实施例提供的基于函数求解方法求得的准确余弦函数的曲线;Fig. 17 is the curve of the accurate cosine function obtained based on the function solving method provided by the embodiment of the present application;
图18为本申请实施例提供的基于函数求解方法求得的近似余弦函数与准确余弦函数之间的对比曲线图;Fig. 18 is a comparison graph between the approximate cosine function and the exact cosine function obtained based on the function solving method provided by the embodiment of the present application;
图19为本申请实施例提供的电子设备的一个结构示意图;FIG. 19 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图20为本申请实施例提供的电子设备的另一个结构示意图。FIG. 20 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面under
本申请实施例提供了一种多级查找表电路、函数求解方法及相关设备。可以将所有函数进行近似布尔函数的分解,从而通过第一模块与第二模块的级联求解函数的输出值。另外,通过多个模块级联的方式可以降低函数对应电路的延时和能耗。Embodiments of the present application provide a multi-level lookup table circuit, a function solving method and related equipment. All functions can be decomposed into approximate Boolean functions, so as to solve the output value of the function through the cascade of the first module and the second module. In addition, the delay and energy consumption of the circuit corresponding to the function can be reduced by cascading multiple modules.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获取的所有其他实施例,都属于本申请保护的范围。The following will describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
为了便于理解,下面先对本申请实施例主要涉及的相关术语和概念进行介绍。For ease of understanding, the relevant terms and concepts mainly involved in the embodiments of the present application are firstly introduced below.
1、神经网络1. Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以X s和截距b为输入的运算单元,该运算单元的输出可以为: A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes X s and the intercept b as input, and the output of the operation unit can be:
Figure PCTCN2022128135-appb-000001
Figure PCTCN2022128135-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为X s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。 Among them, s=1, 2, ... n, n is a natural number greater than 1, W s is the weight of X s , and b is the bias of the neuron unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.
2、真值表2. Truth table
真值表是指表征逻辑事件输入和输出之间全部可能状态的表格。列出命题公式真假值的表。通常以1表示真,0表示假。A truth table is a table representing all possible states between inputs and outputs of a logical event. A table listing the true and false values of a propositional formula. Usually 1 means true and 0 means false.
3、查表(look up table,LUT)3. Look up table (look up table, LUT)
LUT本质是一个随机存取存储器(random access memory,RAM)。目前现场可编程逻辑门阵列(field programmable gate array,FPGA)中多使用4输入的LUT,所以每一个LUT可以看成一个有4位地址线的16×1的RAM。当用户通过原理图或硬件描述语言(hardware description language,HDL)语言描述了一个逻辑电路以后,可编程逻辑器件(programmable logic device,PLD)/FPGA开发软件会自动计算逻辑电路的所有可能的结果,并把结果事先写入RAM。这样,每输入一个信号进行逻辑运算就相当于输入一个地址进行查表,找出地址对应的内容,然后将其输出即可。The essence of LUT is a random access memory (random access memory, RAM). At present, 4-input LUTs are mostly used in field programmable logic gate arrays (field programmable gate arrays, FPGAs), so each LUT can be regarded as a 16×1 RAM with 4-bit address lines. When the user describes a logic circuit through a schematic diagram or a hardware description language (hardware description language, HDL), the programmable logic device (programmable logic device, PLD)/FPGA development software will automatically calculate all possible results of the logic circuit, And write the result to RAM in advance. In this way, each time a signal is input for logical operation, it is equivalent to inputting an address for table lookup, finding out the content corresponding to the address, and then outputting it.
4、布尔函数的不相交分解4. Disjoint decomposition of Boolean functions
令f是一个n输入的布尔函数,其输入为X={x 1,x 2,…,x n},将输入X划分为集合A(可以称为自由集)与B(可以称为约束集)。若存在函数F和Φ,使得f(X)=F(Φ(B),A),则称f有关于自由集A和约束集B的不相交分解。 Let f be a Boolean function with n inputs, its input is X={x 1 ,x 2 ,…,x n }, divide the input X into a set A (which can be called a free set) and B (which can be called a constrained set ). If there are functions F and Φ such that f(X)=F(Φ(B),A), then f is said to have a disjoint decomposition of the free set A and the constrained set B.
然而,并非所有布尔函数都可以进行不相交分解,存在不相交分解的充要条件为下述的定理1。However, not all Boolean functions can be disjointly decomposed. The necessary and sufficient condition for disjoint decomposition is the following Theorem 1.
定理1:将布尔函数的输入X划分为集合A与B,布尔函数f有关于自由集A和约束集B的不相交分解,当且仅当以A为行以B为列的二维真值表的所有行均属于以下四类之一:Theorem 1: Divide the input X of the Boolean function into sets A and B, the Boolean function f has a disjoint decomposition about the free set A and the constrained set B, if and only if the two-dimensional truth value with A as the row and B as the column All rows of a table fall into one of four categories:
类型一(Type1)、行中所有元素为0;Type 1 (Type1), all elements in the row are 0;
类型二(Type2)、行中所有元素为1;Type 2 (Type2), all elements in the row are 1;
类型三(Type3)、行为包含0与1的特征向量;Type 3 (Type3), the behavior contains eigenvectors of 0 and 1;
类型四(Type4)、行为特征向量逐位取反得到的向量。Type 4 (Type4), the vector obtained by bit-by-bit inversion of the behavior feature vector.
示例性的,如图1所示的布尔函数准确分解的示例图,其中,自由集A={x 1,x 2},约束集B={x 3,x 4},二维真值表的四行分别属于第3,4,2,4类。因而存在不相交分解f(X)=F(Φ(B),A),其中,
Figure PCTCN2022128135-appb-000002
Exemplarily, an example diagram of the exact decomposition of a Boolean function as shown in Figure 1, wherein, the free set A={x 1 ,x 2 }, the constraint set B={x 3 ,x 4 }, the two-dimensional truth table The four rows belong to categories 3, 4, 2, and 4, respectively. Thus there is a disjoint decomposition f(X)=F(Φ(B),A), where,
Figure PCTCN2022128135-appb-000002
目前,LUT的行数随函数输入数量的增加而指数增长,导致该技术采用电路的面积较大、延时较高和功耗较高。因此,如何减少LUT对应电路的面积、延时以及功耗是亟待解决的技术问题。Currently, the number of rows of LUTs grows exponentially with the number of function inputs, resulting in larger area, higher latency, and higher power consumption of the circuits used in this technique. Therefore, how to reduce the area, delay and power consumption of the circuit corresponding to the LUT is an urgent technical problem to be solved.
为了解决上述技术问题,本申请实施例提供一种多级查找表电路、函数求解方法及相关设备。可以将所有函数进行近似布尔函数的分解,从而通过第一模块与第二模块的级联求解函数的输出值。另外,通过多个模块级联的方式可以减少函数对应电路的面积、延时和能耗。In order to solve the above technical problems, embodiments of the present application provide a multi-level look-up table circuit, a function solving method and related equipment. All functions can be decomposed into approximate Boolean functions, so as to solve the output value of the function through the cascade of the first module and the second module. In addition, the area, delay and energy consumption of the circuit corresponding to the function can be reduced by cascading multiple modules.
图2是本申请实施例提供的电路应用的系统架构图。图2所示的系统架构图包括控制单元201以及与该控制单元201连接的多级查找表单元202。FIG. 2 is a system architecture diagram of a circuit application provided by an embodiment of the present application. The system architecture diagram shown in FIG. 2 includes a control unit 201 and a multi-level look-up table unit 202 connected to the control unit 201 .
该系统的运算流程是,控制单元201读取函数的输入,控制单元201从多级查找表单元202中查表得到函数的输出值。The operation flow of the system is that the control unit 201 reads the input of the function, and the control unit 201 looks up the table from the multi-level look-up table unit 202 to obtain the output value of the function.
可选地,该系统架构还可以包括算术逻辑单元(arithmetic logic unit,ALU)203。其中,算术逻辑单元203与控制单元201连接。Optionally, the system architecture may further include an arithmetic logic unit (arithmetic logic unit, ALU) 203 . Wherein, the arithmetic logic unit 203 is connected with the control unit 201 .
该系统的运算流程可以调整为:控制单元201读取函数的输入,控制单元201从多级查找表单元202中查表得到函数的特征信息(例如导数信息等),算术逻辑单元203利用该特征信息和函数输入进行运算,得到函数的输出。The operation flow of the system can be adjusted as follows: the control unit 201 reads the input of the function, the control unit 201 obtains the feature information (such as derivative information, etc.) The information and the function input are operated to obtain the output of the function.
在一种可能实现的方式中,具有ALU的系统架构通常适用于准确LUT的分解。不具有ALU的系统架构通常适用于近似LUT的分解。其中,对于准确LUT与近似LUT的描述可以参考后续,具体此处不再赘述。In one possible implementation, system architectures with ALUs are generally suitable for decomposition of accurate LUTs. System architectures that do not have an ALU are generally suitable for decomposition that approximates a LUT. Wherein, for the description of the exact LUT and the approximate LUT, reference may be made later, and details are not repeated here.
本申请实施例提供的系统架构可以应用于光模块、无线、神经网络等场景,具体此处不做限定。在一些情况下,该系统也可以称为加速运算系统。The system architecture provided in the embodiments of the present application can be applied to scenarios such as optical modules, wireless networks, and neural networks, and is not specifically limited here. In some cases, the system may also be referred to as an accelerated computing system.
下面对上述中的多级查找表单元(也可以称为多级查找表电路)进行详细介绍。The above-mentioned multi-level look-up table unit (also called a multi-level look-up table circuit) will be described in detail below.
本申请实施例提供的多级查找表链路可以应用于多输入单输出场景,也可以应用于多输入多输出场景,下面分别描述:The multi-level lookup table link provided in the embodiment of the present application can be applied to multiple-input-single-output scenarios, and can also be applied to multiple-input-multiple-output scenarios, which are described below:
一、对于利用在多输入单输出的场景。1. For scenarios where multiple input and single output are used.
以多级查找表电路为两级查找电路为例,本申请实施例提供的多级查找表电路可以如图3所示。该多级查找表电路用于基于多个查找表求解目标函数的输出值,该多个查找表包括第一查找表与第二查找表,目标函数的第一输入序列包括第一子集与第二子集。该第一子集可以理解为是第一输入序列中的一个子序列,第二子集可以理解为是第一输入序列中的另一个子序列。该多级查找表电路包括第一模块301与第二模块302。Taking the multi-level look-up table circuit as an example of a two-level look-up circuit, the multi-level look-up table circuit provided in the embodiment of the present application may be shown in FIG. 3 . The multi-level lookup table circuit is used to solve the output value of the objective function based on multiple lookup tables, the multiple lookup tables include a first lookup table and a second lookup table, and the first input sequence of the objective function includes a first subset and a second lookup table Two subsets. The first subset can be understood as a subsequence in the first input sequence, and the second subset can be understood as another subsequence in the first input sequence. The multi-level look-up table circuit includes a first module 301 and a second module 302 .
第一模块301,用于基于第一子集与第一查找表确定第一函数的输出值,第一函数为目标函数中的嵌套函数;The first module 301 is configured to determine the output value of the first function based on the first subset and the first lookup table, and the first function is a nested function in the objective function;
第二模块302,用于基于第二子集、第二查找表以及第一函数的输出值确定目标函数的输出值。The second module 302 is configured to determine the output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
可选地,目标函数的表达式为:f(X)=F(Φ(B),A),其中,f(X)为目标函数,F(Φ(B),A)为目标函数或近似后的目标函数,第一输入序列:X'={x' n,…,x′ 1},第一子集:B=x' b,…,x′ 1,第二子集:A=x' n,…,x' b+1,Φ(B)为第一函数。则第一查找表的输入数量为b个,第二查找表的输入数量为n-b+1个。 Optionally, the expression of the objective function is: f(X)=F(Φ(B), A), wherein, f(X) is the objective function, and F(Φ(B), A) is the objective function or approximate After the objective function, the first input sequence: X'={x' n ,…,x′ 1 }, the first subset: B=x’ b ,…,x′ 1 , the second subset: A=x ' n ,…,x' b+1 , Φ(B) is the first function. Then the number of inputs to the first lookup table is b, and the number of inputs to the second lookup table is n−b+1.
该多级查找表电路通过第一模块与第二模块的级联,可以减少目标函数对应电路的面积、延时和能耗。第一模块与第二模块可以实现近似布尔函数的分解。The multi-level look-up table circuit can reduce the area, delay and energy consumption of the circuit corresponding to the objective function through the cascade connection of the first module and the second module. The first module and the second module can realize the decomposition of approximate Boolean functions.
可选地,该多级查找表电路还可以包括打乱模块303,用于获取目标函数的第二输入序列,打乱第二输入序列的排序,得到第一输入序列,并分解第一输入序列得到第一子集与第二子集。打乱模块303,还用于向第一模块301发送第一子集,向第二模块302发送第二子集。Optionally, the multi-level lookup table circuit may also include a scrambling module 303, which is used to obtain the second input sequence of the objective function, scramble the sorting of the second input sequence, obtain the first input sequence, and decompose the first input sequence Obtain the first subset and the second subset. The shuffling module 303 is further configured to send the first subset to the first module 301 and send the second subset to the second module 302 .
可选地,打乱模块303的结构可以如图4所示,打乱模块303将第二输入序列重新排序(或称为打乱)得到第一输入序列:X'={x' n,…,x′ 1}。将前b个输入到第一模块301中 作为第一子集B,将n-b个输入到第二模块302中作为第二子集A。打乱模块303如图4所示,可以使用n个n-to-1多路复用器(multiplexer,MUX)实现,第j个MUX由信号S j控制,以确定具体的打乱情况。 Optionally, the structure of the scrambling module 303 can be as shown in FIG. 4 , the scrambling module 303 reorders (or calls it scrambling) the second input sequence to obtain the first input sequence: X'={x' n ,  … ,x′ 1 }. The first b ones are input into the first module 301 as the first subset B, and the nb ones are input into the second module 302 as the second subset A. As shown in FIG. 4 , the scrambling module 303 can be realized by using n n-to-1 multiplexers (multiplexers, MUX), and the jth MUX is controlled by the signal S j to determine the specific scrambling situation.
该方式中,通过引入打乱模块对目标函数的输入进行打乱,可以提升后续近似LUT的表达能力。In this way, by introducing a scrambling module to scramble the input of the objective function, the expression ability of the subsequent approximate LUT can be improved.
可选地,本申请实施例中的目标函数可以是可以进行布尔函数分解的连续函数以及一部分非连续函数。也可以是不满足分解条件的函数,该分解条件为布尔函数对应真值表的分解条件。其中,真值表的分解条件如前述定理1的描述,具体此处不再赘述。Optionally, the objective function in this embodiment of the present application may be a continuous function and a part of discontinuous functions that can be decomposed by Boolean functions. It may also be a function that does not satisfy the decomposition condition, which is the decomposition condition of the Boolean function corresponding to the truth table. Wherein, the decomposition condition of the truth table is as described in the above-mentioned Theorem 1, and details are not repeated here.
可选地,若目标函数是不满足分解条件的函数,该多级查找表电路还可以包括配置模块304,用于将目标函数对应的真值表进行近似处理,得到近似后的真值表,并将近似后的真值表分解为第一查找表与第二查找表。配置模块304,还用于向第一模块301发送第一查找表,以及向第二模块302发送第二查找表。Optionally, if the objective function is a function that does not satisfy the decomposition condition, the multi-level lookup table circuit may also include a configuration module 304 for approximating the truth table corresponding to the objective function to obtain an approximated truth table, And the approximated truth table is decomposed into a first lookup table and a second lookup table. The configuration module 304 is further configured to send the first lookup table to the first module 301 and send the second lookup table to the second module 302 .
可选地,该配置模块304,还可以理解为是用于配置第一查找表与第二查找表,按照一定的时序,向打乱模块303写入输入的排序信息,并向第一模块与第二模块写入近似LUT的数据。Optionally, the configuration module 304 can also be understood as being used to configure the first lookup table and the second lookup table, write the input sorting information to the scrambling module 303 according to a certain sequence, and write the sorting information to the first module and the second lookup table. The second module writes data approximating the LUT.
另外,上述打乱模块303还可以用于对第二输入序列进行多次打乱,得到不同打乱后的输入序列,并基于不同打乱后的输入序列得到的输出值之后,比较各个输出值与目标函数的实际输出之间的误差,确定误差最小对应输入序列为第一输入序列。In addition, the above-mentioned scrambling module 303 can also be used to scramble the second input sequence multiple times to obtain different scrambled input sequences, and after output values obtained based on different scrambled input sequences, compare each output value The error between the actual output of the objective function and the minimum error is determined to be the first input sequence corresponding to the input sequence.
下面介绍上述对目标函数对应的真值表进行近似处理以及将近似后的真值表分解为第一查找表与第二查找表的过程。该过程可以理解为是:若目标函数不满足前述定理1,则修改目标函数的真值表并引入近似,使得近似后的真值表可以被分解。The process of approximating the truth table corresponding to the objective function and decomposing the approximated truth table into a first lookup table and a second lookup table will be described below. This process can be understood as: if the objective function does not satisfy the aforementioned Theorem 1, modify the truth table of the objective function and introduce approximation, so that the approximated truth table can be decomposed.
示例性的,如图5中的(a)所示,目标函数的原始真值表不满足定理1,即不存在关于第二子集(也可以称为自由集)(A={x 1,x 2})和第一子集(也可以称为约束集)(B={x 3,x 4})的不相交分解。通过引入近似,使得图5中的(b)的二维真值表满足定理1,从而可以将原目标函数可以近似地分解成两个函数F和Φ。分解出的第一查找表LUT0与第二查找表LUT1如图5所示。 Exemplarily, as shown in (a) in Figure 5, the original truth table of the objective function does not satisfy Theorem 1, that is, there is no second subset (also called the free set) (A={x 1 , x 2 }) and the disjoint decomposition of the first subset (may also be called the constraint set) (B={x 3 ,x 4 }). By introducing an approximation, the two-dimensional truth table in (b) in Figure 5 satisfies Theorem 1, so that the original objective function can be decomposed into two functions F and Φ approximately. The decomposed first lookup table LUT0 and second lookup table LUT1 are shown in FIG. 5 .
本实施例中的近似处理可以理解为是优化问题,优化目标是最小化错误率,要求解的变量为特征向量V和类别向量T。其中,类别向量T用于表示真值表每一行所属的类型(即前述定理1中的类型),例如前述图5中的(b)对应的类别向量T=(1,3,1,3)。另外,一组驱动的特征向量V和类别向量T对应唯一的近似真值表。该真值表可以是多维真值表,这里仅以二维真值表为例进行描述。The approximation processing in this embodiment can be understood as an optimization problem, the optimization goal is to minimize the error rate, and the variables to be solved are the feature vector V and the category vector T. Among them, the category vector T is used to represent the type to which each row of the truth table belongs (that is, the type in the aforementioned Theorem 1), for example, the category vector T=(1,3,1,3) corresponding to (b) in the aforementioned Figure 5 . Additionally, a set of driven feature vectors V and class vectors T corresponds to a unique approximate truth table. The truth table may be a multidimensional truth table, and here only a two-dimensional truth table is used as an example for description.
解优化问题包括以下几个步骤:Solving an optimization problem involves the following steps:
步骤1:随机初始化特征向量V。Step 1: Randomly initialize the feature vector V.
步骤2:交替优化类别向量T和特征向量V。Step 2: Alternately optimize the category vector T and feature vector V.
该步骤2可以包括下述3个子步骤:This step 2 may include the following 3 sub-steps:
步骤2.1,固定特征向量V,改变类别向量T,使得错误率最小;Step 2.1, fix the feature vector V and change the category vector T to minimize the error rate;
步骤2.2,固定类别向量T,改变特征向量V,使得错误率最小;Step 2.2, fix the category vector T and change the feature vector V to minimize the error rate;
步骤2.3,若V和T都不发生改变,执行步骤3,否则执行步骤2.1。Step 2.3, if neither V nor T changes, go to step 3, otherwise go to step 2.1.
步骤3,根据特征向量V和类别向量T确定近似二维真值表。 Step 3, determine an approximate two-dimensional truth table according to the feature vector V and the category vector T.
示例性的,准确二维真值表以及上述解优化问题的步骤可以参考图6。可以看出,近似后的二维真值表的错误率只有1/8。For an exemplary accurate two-dimensional truth table and the above-mentioned steps for solving the optimization problem, reference may be made to FIG. 6 . It can be seen that the error rate of the approximate two-dimensional truth table is only 1/8.
二、对于利用在多输入多输出的场景。Second, for the use of multi-input multi-output scenarios.
本申请实施例提供的多级查找表电路可以如图7所示。该多级查找表电路用于基于多个查找表求解目标函数的输出值,该多个查找表的数量为m个,目标函数的第二输入序列为X={x n,…,x 1}。该多级查找表电路包括m个模块(第一个模块,...,第m个模块)。其中,每个模块相当于前述的图3。该每个模块可以理解为是前述图3的多级查找表电路。该多级查找表电路的数量与输出的数量一致。 The multi-level look-up table circuit provided in the embodiment of the present application may be shown in FIG. 7 . The multi-level lookup table circuit is used to solve the output value of the objective function based on multiple lookup tables, the number of the multiple lookup tables is m, and the second input sequence of the objective function is X={x n ,...,x 1 } . The multi-level look-up table circuit includes m modules (the first module, . . . , the mth module). Wherein, each module is equivalent to the aforementioned Fig. 3 . Each module can be understood as the multi-level look-up table circuit in FIG. 3 . The number of multi-level look-up table circuits corresponds to the number of outputs.
本实施例中的近似处理可以理解为是优化问题,优化目标是最小化归一化的平均误差距离(normalized mean of error distance,NMED),要求解的变量为特征向量V和类别向量T。其中,特征向量V和类别向量T如前述多输入单输出场景中的特征向量V和类别向量T,此处不再赘述。The approximation processing in this embodiment can be understood as an optimization problem, the optimization goal is to minimize the normalized mean of error distance (NMED), and the variables to be solved are the feature vector V and the category vector T. Wherein, the feature vector V and the category vector T are the same as the feature vector V and the category vector T in the aforementioned multiple-input-single-output scenario, which will not be repeated here.
上述的NMED的计算公式可以如下:The calculation formula of the above-mentioned NMED can be as follows:
Figure PCTCN2022128135-appb-000003
Figure PCTCN2022128135-appb-000003
其中,目标函数有I个输入,O个输出,p i是第i组输入组合出现的概率,
Figure PCTCN2022128135-appb-000004
和g i分别是在第i组输入组合下,近似后的函数(也可以称为近似函数)和目标函数(也可以称为准确函数)的所有输出位按照二进制拼接得到的输出值。
Among them, the objective function has I input and O output, p i is the probability of the i-th group of input combinations appearing,
Figure PCTCN2022128135-appb-000004
and g i are respectively the output values of all output bits of the approximated function (also called approximate function) and target function (also called accurate function) according to the binary concatenation under the i-th input combination.
该优化过程可以分解为从目标函数的输出按照二进制最高位到最低位,依次优化每一个输出位上的目标函数。其中,当优化第k个输出位时,固定除第k位以外的其他位的近似函数,确定第k个输出位上的最优打乱方式,以及最优的特征向量V和类别向量T,使得近似后的函数与目标函数之间的NMED最小。The optimization process can be decomposed into sequentially optimizing the objective function on each output bit from the output of the objective function according to the highest bit to the lowest bit of the binary system. Among them, when optimizing the kth output bit, the approximate function of other bits except the kth bit is fixed, and the optimal scrambling method on the kth output bit is determined, as well as the optimal feature vector V and category vector T, The NMED between the approximated function and the objective function is minimized.
示例性的,上述多输出的优化问题可以如图8所示。以多输出为二进制的3位比特位为例,在确定最高位的最优打乱方式、V以及T时,固定中间位以及最低位的真值表。在确定中间位的最优打乱方式、V以及T时,固定最高位以及最低位的真值表。在确定最低位的最优打乱方式、V以及T时,固定最高位以及中间位的真值表。Exemplarily, the above multi-output optimization problem may be as shown in FIG. 8 . Taking the 3-bit binary output as an example, when determining the optimal scrambling method of the highest bit, V and T, the truth table of the middle bit and the lowest bit is fixed. When determining the optimal scrambling mode, V and T of the middle bit, the truth table of the highest bit and the lowest bit is fixed. When determining the optimal scrambling method of the lowest bit, V and T, the truth table of the highest bit and the middle bit is fixed.
本申请实施例中,一方面,电路可以通过多个查找表的方式求解目标函数的输出值。第一模块与第二模块可以基于不同的查找表确定不同函数的输出。即第一模块可以基于第一子集与第一查找表确定第一函数的输出,第二模块可以基于第二子集、第二查找表以及第一函数的输出确定出目标函数的输出。将目标函数进行布尔函数的分解,提高了多级逻辑函数分解的效率,能获得多级逻辑分解结果。另一方面,通过第一模块与第二模块的级联,还可以减 少目标函数对应电路的面积、延时和能耗。另一方面,通过对不满足布尔函数分解条件的函数对应的真值表进行近似处理,在分解近似后的真值表,从而可以适用于所有函数的分解,进而可以通过逻辑电路实现目标函数的求解。或者理解为,通过向系统中引入微小的误差,来换取电路的面积、延时和功耗的降低。近似处理相较于与准确LUT相比,存储开销大幅降低。近似处理可以近似地计算有很多输入数的函数,同时电路的面积小,功耗低,延时低。In the embodiment of the present application, on the one hand, the circuit can solve the output value of the objective function by means of multiple lookup tables. The first module and the second module can determine outputs of different functions based on different look-up tables. That is, the first module can determine the output of the first function based on the first subset and the first lookup table, and the second module can determine the output of the objective function based on the second subset, the second lookup table, and the output of the first function. Decomposing the objective function into a Boolean function improves the efficiency of multi-level logic function decomposition, and can obtain multi-level logic decomposition results. On the other hand, through the cascade connection of the first module and the second module, the area, delay and energy consumption of the circuit corresponding to the objective function can also be reduced. On the other hand, by approximating the truth table corresponding to the function that does not meet the Boolean function decomposition conditions, after decomposing the approximate truth table, it can be applied to the decomposition of all functions, and then the objective function can be realized through logic circuits. solve. Or understand that by introducing small errors into the system, it is exchanged for the reduction of circuit area, delay and power consumption. Approximate processing reduces storage overhead compared to exact LUTs. Approximate processing can approximate functions with many input numbers, and at the same time, the area of the circuit is small, the power consumption is low, and the delay is low.
为了更直观的看出本申请实施例提出的多级查找表电路的有益效果,下面分别对两级查找表电路、以及现有技术应用于连续函数与非连续函数上的表现进行描述。其中,现有技术包括最低位进行四舍五入的LUT方案(Round)以及存储导数方式的近似LUT(ApproxLUT)。In order to more intuitively see the beneficial effect of the multi-level look-up table circuit proposed by the embodiment of the present application, the following describes the performance of the two-level look-up table circuit and the prior art applied to continuous functions and discontinuous functions respectively. Among them, the prior art includes a LUT scheme (Round) in which the lowest bit is rounded and an approximate LUT (ApproxLUT) in which a derivative is stored.
一、连续函数。1. Continuous function.
该连续函数的选取可以如表1所示。将连续函数的输入序列与输出值量化为16比特位,其中,约束集大小为9,对应第一级LUT的行数为512。自由集大小为7,对应的第二级LUT的行数为256,以此配置对连续函数进行近似分解。The selection of the continuous function can be shown in Table 1. Quantize the input sequence and output value of the continuous function to 16 bits, where the size of the constraint set is 9, and the number of rows corresponding to the first-level LUT is 512. The size of the free set is 7, and the number of rows of the corresponding second-level LUT is 256. This configuration approximates the decomposition of the continuous function.
表1Table 1
Figure PCTCN2022128135-appb-000005
Figure PCTCN2022128135-appb-000005
其中,对于表1对应的有益效果可以如图9所示,其中,基于函数分解的近似查找表架构(decomposition-based approximate lookup table architecture,DALTA)为本申请实施例提供的多级查找表电路。由图9可以看出,本实施例提供的多级查找表电路得到的连续函数的曲线与准确LUT(即可以通过定理1分解的查找表求解函数的方式)的曲线几乎一致,说明本实施例提供的多级查找表电路的误差很小。Wherein, the beneficial effect corresponding to Table 1 can be shown in FIG. 9, wherein, a decomposition-based approximate lookup table architecture (DALTA) is a multi-level lookup table circuit provided by the embodiment of the present application. As can be seen from Fig. 9, the curve of the continuous function obtained by the multi-level look-up table circuit provided in this embodiment is almost consistent with the curve of the accurate LUT (that is, the method of solving the function of the look-up table that can be decomposed by Theorem 1), indicating that this embodiment The provided multi-level look-up table circuit has very little error.
另外,对比现有技术Round和ApproxLUT后的效果如图10所示,可以看出,本实施例提供的多级查找表电路在误差降低的前提下,降低了97.5%的面积(Area),40.7%的延时(Latency)和99%的能耗(Energy)。与ApproxLUT误差相同时,DALTA相对于ApproxLUT降低了92.4%的延时和56.5%的能耗。In addition, comparing the effects of the prior art Round and ApproxLUT as shown in Figure 10, it can be seen that the multi-level look-up table circuit provided by this embodiment reduces the area (Area) by 97.5% on the premise of reducing the error, 40.7 % delay (Latency) and 99% energy consumption (Energy). When the error of ApproxLUT is the same, DALTA reduces the delay by 92.4% and the energy consumption by 56.5% compared with ApproxLUT.
二、非连续函数。Second, the non-continuous function.
该非连续函数的选取可以如表2所示。将非连续函数的输入序列量化为16位比特,其中,约束集大小为9,对应的第一级LUT的行数为512。自由集大小为7,对应的第二级LUT的行数为256,以此配置对非连续函数进行近似分解。The selection of the discontinuous function can be shown in Table 2. The input sequence of the discontinuous function is quantized to 16 bits, wherein the size of the constraint set is 9, and the number of rows of the corresponding first-level LUT is 512. The size of the free set is 7, and the number of rows of the corresponding second-level LUT is 256. With this configuration, the discontinuous function is approximately decomposed.
表2Table 2
非连续函数non-continuous function 输入的比特位input bit 输出的比特位output bit 应用场景Application Scenario
Brent-KungBrent-Kung 1616 99 算术运算arithmetic operation
Forwardk2jForwardk2j 1616 1616 机器人科学robotics
Inversek2jInversek2j 1616 1616 机器人科学robotics
MultiplierMultiplier 1616 1616 算术运算arithmetic operation
其中,对于表2对应的有益效果可以如表3以及图11所示。由表3可以看出,本申请实施例提供的方案(即DALTA)与ApproxLUT在非连续函数上进行对比。在消耗相同的存储空间时,DALTA的误差远低于ApproxLUT,这是因为ApproxLUT依赖泰勒展开,在非连续函数上效果不好。由图11可以看出,DALTA与Round相比,在误差降低的前提下,降低了95.8%的面积,39.0%的延时和98.3%的能耗。Wherein, the beneficial effects corresponding to Table 2 may be shown in Table 3 and FIG. 11 . It can be seen from Table 3 that the solution provided by the embodiment of the present application (that is, DALTA) is compared with ApproxLUT on the discontinuous function. When consuming the same storage space, the error of DALTA is much lower than that of ApproxLUT. This is because ApproxLUT relies on Taylor expansion and does not work well on non-continuous functions. It can be seen from Figure 11 that, compared with Round, DALTA reduces the area by 95.8%, the delay by 39.0% and the energy consumption by 98.3% on the premise of reducing the error.
表3table 3
Figure PCTCN2022128135-appb-000006
Figure PCTCN2022128135-appb-000006
另外,以目标函数为余弦函数为例,对本申请实施例提供的多级查找表电路的效果进行进一步的描述。该过程可以如图12所示,固定输出数量(二进制的比特位数)m取固定值为16,输入数量(二进制的比特位数)n从8递增到16。并确定自由集大小与约束集大小为n/2后取整(向上取整或向下取整),再根据参数m、n、自由集和约束集生成多级查找表电路。并测试多级查找表电路的误差、面积、延时以及功耗。测试结果如表4所示。In addition, the effect of the multi-level look-up table circuit provided by the embodiment of the present application is further described by taking the cosine function as an example as the objective function. The process can be shown in FIG. 12 , the fixed output quantity (binary number of bits) m takes a fixed value of 16, and the input quantity (binary number of bits) n increases from 8 to 16. And determine the size of the free set and the size of the constraint set as n/2 and then round up (round up or down), and then generate a multi-level lookup table circuit according to the parameters m, n, the free set and the constraint set. And test the error, area, delay and power consumption of the multi-level look-up table circuit. The test results are shown in Table 4.
表4Table 4
输入比特位/输出比特位Input bit/Output bit NMEDNMED 面积area 延时time delay 功耗power consumption
8/168/16 0.230%0.230% 3.0×3.0× 1.7×1.7× 4.5×4.5×
9/169/16 0.121%0.121% 4.3×4.3× 1.9×1.9× 6.4×6.4×
10/1610/16 0.124%0.124% 6.2×6.2× 1.8×1.8× 9.2×9.2×
11/1611/16 0.052%0.052% 9.4×9.4× 2.2×2.2× 13.9×13.9×
12/1612/16 0.060%0.060% 13.6×13.6× 2.4×2.4× 20.2×20.2×
13/1613/16 0.028%0.028% 20.7×20.7× 2.6×2.6× 30.8×30.8×
14/1614/16 0.027%0.027% 29.2×29.2× 2.5×2.5× 43.3×43.3×
15/1615/16 0.013%0.013% 45.0×45.0× 2.7×2.7× 66.7×66.7×
16/1616/16 0.014%0.014% 62.4×62.4× 1.6×1.6× 92.5×92.5×
其中,×表示:相对于准确LUT的提升。由表4可以看出,随着输入数量的增加,误差 NMED逐渐降低,相对于准确LUT,本申请的面积和功耗呈指数降低。Among them, × means: the improvement relative to the accurate LUT. It can be seen from Table 4 that as the number of inputs increases, the error NMED decreases gradually. Compared with the accurate LUT, the area and power consumption of this application decrease exponentially.
目前,只有满足上述定理1的布尔函数才可以实现分解,使得基于布尔函数的查找表技术无法应用于所有函数。Currently, only Boolean functions that satisfy the above Theorem 1 can be decomposed, making the Boolean function-based lookup table technique unable to be applied to all functions.
为此,本申请实施例提供一种函数求解方法及相关设备。可以将所有函数进行近似布尔函数的分解得到至少两个查找表,从而通过至少两个查找表求解出函数的输出值。To this end, the embodiments of the present application provide a method for solving a function and related equipment. All functions can be decomposed into approximate Boolean functions to obtain at least two lookup tables, so that the output value of the function can be obtained by solving the at least two lookup tables.
下面对本申请实施例提供的函数求解方法进行描述。The function solving method provided by the embodiment of the present application is described below.
本申请实施例提供的函数求解方法可以应用于光模块、无线、神经网络等适用于查找表求解函数的场景,具体此处不做限定。该方法可以由电子设备执行,也可以由电子设备的部件(例如处理器、芯片、或芯片系统等)执行,具体此处不做限定。The function solving method provided in the embodiment of the present application can be applied to scenarios such as optical modules, wireless networks, and neural networks that are suitable for solving functions of a lookup table, and the details are not limited here. The method may be executed by an electronic device, or may be executed by a component of the electronic device (eg, a processor, a chip, or a chip system, etc.), which is not specifically limited here.
请参阅图13,该方法包括步骤1301至步骤1304,下面分别描述。Please refer to FIG. 13 , the method includes steps 1301 to 1304 , which are described below respectively.
步骤1301,获取目标函数的第一输入序列。 Step 1301, obtain the first input sequence of the objective function.
可选地,在本步骤之前,还可以获取目标函数的第二输入序列,并将第二输入序列中的排序进行打乱得到第一输入序列。Optionally, before this step, the second input sequence of the objective function may also be obtained, and the sorting in the second input sequence is disturbed to obtain the first input sequence.
进一步的,还可以对第二输入序列进行打乱得到第三输入序列,并确定第一误差小于第二误差。该第一误差为基于第一输入序列得到的目标函数的输出值与目标函数的实际输出值之间的误差。第二误差为基于第三输入序列得到的目标函数的输出值与目标函数的实际输出值简单误差。Further, the second input sequence may also be scrambled to obtain the third input sequence, and it is determined that the first error is smaller than the second error. The first error is an error between the output value of the objective function obtained based on the first input sequence and the actual output value of the objective function. The second error is a simple error between the output value of the objective function obtained based on the third input sequence and the actual output value of the objective function.
本申请实施例中对第二输入序列的打乱方式以及打乱次数不做限定。In this embodiment of the present application, there is no limitation on the shuffling manner and number of shuffling times of the second input sequence.
其中,对于打乱后如何确定最优的打乱方案可以参考前述图6或图8对应实施例的描述,此处不再赘述,确定最优打乱方案得到的输入序列为第一输入序列。For how to determine the optimal scrambling scheme after scrambling, reference may be made to the description of the corresponding embodiment in FIG. 6 or FIG. 8 , which will not be repeated here. The input sequence obtained by determining the optimal scrambling scheme is the first input sequence.
本申请实施例中的第一输入序列包括至少两个子集,该至少两个子集包括第一子集与第二子集,目标函数为不满足分解条件的函数,该分解条件为布尔函数对应真值表的分解条件。其中,该分解条件可以理解为是前述的定理1,具体此处不再赘述。The first input sequence in the embodiment of the present application includes at least two subsets, the at least two subsets include the first subset and the second subset, the objective function is a function that does not satisfy the decomposition condition, and the decomposition condition is that the Boolean function corresponds to true The decomposition criteria for the value table. Wherein, the decomposition condition can be understood as the above-mentioned Theorem 1, and details will not be repeated here.
步骤1302,基于第一输入序列与分解条件确定目标函数的第一查找表与第二查找表。 Step 1302, determine a first lookup table and a second lookup table of the objective function based on the first input sequence and decomposition conditions.
获取目标函数的第一输入序列之后,可以基于第一输入序列与分解条件确定目标函数的第一查找表与第二查找表。其中,该第一查找表与第一子集相关,第二查找表与第二子集相关。After the first input sequence of the objective function is acquired, a first lookup table and a second lookup table of the objective function may be determined based on the first input sequence and decomposition conditions. Wherein, the first lookup table is related to the first subset, and the second lookup table is related to the second subset.
本步骤具体可以包括:基于第一输入序列与分解条件对目标函数的真值表进行近似处理,得到近似后的真值表,近似后的真值表满足分解条件。从而可以分解近似后的真值表得到第一查找表与第二查找表。This step may specifically include: performing approximate processing on the truth table of the objective function based on the first input sequence and decomposition conditions to obtain an approximated truth table, and the approximated truth table satisfies the decomposition conditions. Therefore, the approximated truth table can be decomposed to obtain the first lookup table and the second lookup table.
本步骤中的近似处理等描述可以参考前述实施例的描述,具体此处不再赘述。For descriptions such as approximation processing in this step, reference may be made to the descriptions of the foregoing embodiments, and details are not repeated here.
步骤1303,基于第一子集与第一查找表确定第一函数的输出值。 Step 1303, determine the output value of the first function based on the first subset and the first lookup table.
确定第一查找表与第二查找表之后,可以基于第一子集与第一查找表确定第一函数的输出值。其中,该第一函数可以理解为是目标函数的嵌套函数。After the first lookup table and the second lookup table are determined, the output value of the first function may be determined based on the first subset and the first lookup table. Wherein, the first function can be understood as a nested function of the objective function.
步骤1304,基于第二子集、第二查找表以及第一函数的输出值确定目标函数的输出值。 Step 1304, determine the output value of the objective function based on the second subset, the second lookup table and the output value of the first function.
确定第一查找表与第二查找表之后,可以基于第二子集、第二查找表以及第一函数的输出值确定目标函数的输出值。该输出值由于是通过近似后的真值表分解的多个查找表得到的,该输出值可以理解为是目标函数的近似输出值,也可以理解为是近似后的目标函数的输出值。After the first lookup table and the second lookup table are determined, the output value of the objective function may be determined based on the second subset, the second lookup table, and the output value of the first function. Since the output value is obtained by decomposing multiple lookup tables through the approximated truth table, the output value can be understood as the approximate output value of the objective function, and can also be understood as the output value of the approximated objective function.
本实施例中,目标函数的表达式可以如下:In this embodiment, the expression of the objective function can be as follows:
f(X)≈F(Φ(B),A);f(X)≈F(Φ(B),A);
其中,f(x)为目标函数,F(Φ(B),A)为近似后的目标函数,B为第一子集,A为第二子集,Φ(B)为第一函数。Among them, f(x) is the objective function, F(Φ(B),A) is the approximated objective function, B is the first subset, A is the second subset, and Φ(B) is the first function.
本实施例中,可以对不满足分解条件的目标函数进行近似布尔函数的分解得到至少两个查找表,从而通过至少两个查找表求解出目标函数的输出值(也可以理解为是近似值)。In this embodiment, at least two lookup tables can be obtained by decomposing an approximate Boolean function on an objective function that does not meet the decomposition conditions, so as to obtain the output value of the objective function (which can also be understood as an approximate value) through at least two lookup tables.
下面以目标函数为余弦函数,该余弦值的输入序列为5个比特位(x 5,x 4,x 3,x 2,x 1),输出值为5个比特位(y 5,y 4,y 3,y 2,y 1)。设置自由集大小为3,约束集大小为2为例,对上述步骤1302中的近似处理,在多输入多输出场景下真值表的近似以及基于近似后真值表分解的5个真值表(与输出的比特位数量一致)得到的余弦函数曲线。 In the following, the objective function is the cosine function, the input sequence of the cosine value is 5 bits (x 5 , x 4 , x 3 , x 2 , x 1 ), and the output value is 5 bits (y 5 , y 4 , y 3 , y 2 , y 1 ). Set the size of the free set to 3 and the size of the constraint set to 2 as an example, for the approximation process in the above step 1302, the approximation of the truth table in the multi-input multi-output scenario and five truth tables based on the decomposition of the truth table after approximation (consistent with the number of output bits) to obtain the cosine function curve.
该近似过程可以如图14所示,该近似过程包括步骤1401至步骤1404,下面分别描述:The approximation process can be shown in Figure 14, and the approximation process includes steps 1401 to 1404, which are described below:
步骤1401,分解最高位(第5位),固定剩余位(第1-4位)。 Step 1401, decompose the highest bit (bit 5), and fix the remaining bits (bit 1-4).
由于第1-4位上的近似函数尚未确定,本实施例使用第1-4位的准确函数来估计对应位上的近似函数。随后,本实施例找到了第5位上的最优打乱方式,即自由集为{x 1,x 2,x 3}和约束集为{x 4,x 5},以及在该打乱方式下,最优的特征向量V={1,1,0,0}和类别向量T={3,3,3,3,3,3,3,3},使得近似函数与准确函数之间的NMED最小。如图15所示,在该打乱方式下,第5位上的准确二维真值表和近似二维真值表完全相同。 Since the approximate function of the 1-4 bit has not been determined, this embodiment uses the exact function of the 1-4 bit to estimate the approximate function of the corresponding bit. Subsequently, this embodiment finds the optimal scrambling method on the 5th position, that is, the free set is {x 1 , x 2 , x 3 } and the constraint set is {x 4 , x 5 }, and in this scrambling method Next, the optimal feature vector V={1,1,0,0} and category vector T={3,3,3,3,3,3,3,3}, so that the approximate function and the exact function NMED min. As shown in FIG. 15 , in this scrambling mode, the exact two-dimensional truth table and the approximate two-dimensional truth table on the fifth bit are exactly the same.
步骤1402,分解第4位,固定剩余位(第1-3位和第5位)。 Step 1402, decompose the 4th digit, and fix the remaining digits (1-3 digits and 5th digit).
由于第1-3位上的近似函数尚未确定,本实施例使用第1-3位的准确函数来估计对应位上的近似函数,此外,第5位上的近似函数在第一步操作后已经获得。随后,本实施例找到了第4位上的最优打乱方式,即自由集为{x 3,x 4,x 5}和约束集为{x 1,x 2},以及在该打乱方式下,最优的特征向量V={1,1,1,0}和类别向量T={2,2,3,1,2,1,1,1},使得近似函数与准确函数之间的NMED最小。如图16所示,在该打乱方式下,第4位上的准确二维真值表和近似二维真值表有1个位置不相同。 Since the approximation function on the 1st-3rd position has not been determined, the present embodiment uses the exact function of the 1st-3rd position to estimate the approximate function on the corresponding position. In addition, the approximate function on the 5th position has been completed after the first step operation get. Subsequently, this embodiment finds the optimal scrambling method on the 4th position, that is, the free set is {x 3 , x 4 , x 5 } and the constraint set is {x 1 , x 2 }, and in this scrambling method Next, the optimal feature vector V={1,1,1,0} and category vector T={2,2,3,1,2,1,1,1}, so that the approximate function and the exact function NMED min. As shown in FIG. 16 , in this scrambling mode, there is one position difference between the exact two-dimensional truth table and the approximate two-dimensional truth table at the fourth bit.
步骤1403,以此类推,分别分解第3、2、1位。 Step 1403, and so on, decompose the 3rd, 2nd, and 1st bits respectively.
分解第3位,固定剩余位(第1、2、4、5位)。分解第2位,固定剩余位(第1位和第3-5位)。分解第1位,固定剩余位(第2-5位)。得到对应输出位上的近似函数。Decompose the 3rd digit and fix the remaining digits (1st, 2nd, 4th, 5th digits). Disassemble the 2nd digit and fix the remaining digits (1st digit and 3-5th digit). Decompose the 1st digit and fix the remaining digits (2-5 digits). Get the approximate function on the corresponding output bit.
步骤1404,重复步骤1401至步骤1403,直至满足预设条件。 Step 1404, repeat steps 1401 to 1403 until the preset condition is met.
该方式中的预设条件包括NMED的值不下降或者函数的近似输出值与真实输出值之间的差异小于预设阈值为止。The preset conditions in this method include that the value of NMED does not drop or the difference between the approximate output value of the function and the real output value is smaller than the preset threshold.
准确的余弦函数曲线如图17所示,得到的NMED=0.81%的近似余弦函数,与准确余弦函数的曲线对比图可以如图18所示。可以看出近似余弦函数与准确余弦函数的曲线近似。证明了本实施例提供的函数求解方法的求解结果更接近于函数的真实输出值。The exact cosine function curve is shown in FIG. 17 , and the obtained approximate cosine function with NMED=0.81% can be compared with the exact cosine function curve as shown in FIG. 18 . It can be seen that the approximate cosine function approximates the curve of the exact cosine function. It is proved that the solution result of the function solution method provided by this embodiment is closer to the real output value of the function.
上面对本申请实施例中的函数求解方法进行了描述,下面对本申请实施例中的电子设备进行描述,请参阅图19,本申请实施例中电子设备的一个实施例包括:The function solving method in the embodiment of the present application is described above, and the electronic device in the embodiment of the present application is described below. Please refer to FIG. 19. An embodiment of the electronic device in the embodiment of the present application includes:
获取单元1901,用于获取目标函数的第一输入序列,第一输入序列包括至少两个子集,至少两个子集包括第一子集与第二子集,目标函数为不满足分解条件的函数,分解条件为布尔函数对应真值表的分解条件;The acquisition unit 1901 is configured to acquire a first input sequence of the objective function, the first input sequence includes at least two subsets, the at least two subsets include the first subset and the second subset, the objective function is a function that does not satisfy the decomposition condition, The decomposition condition is the decomposition condition of the Boolean function corresponding to the truth table;
第一确定单元1902,用于基于第一输入序列与分解条件确定目标函数的第一查找表与第二查找表,第一查找表与第一子集相关,第二查找表与第二子集相关;The first determination unit 1902 is configured to determine a first lookup table and a second lookup table of the objective function based on the first input sequence and decomposition conditions, the first lookup table is related to the first subset, and the second lookup table is related to the second subset relevant;
第二确定单元1903,用于基于第一子集与第一查找表确定第一函数的输出值,第一函数为目标函数中的嵌套函数;The second determining unit 1903 is configured to determine the output value of the first function based on the first subset and the first lookup table, and the first function is a nested function in the objective function;
第三确定单元1904,用于基于第二子集、第二查找表以及第一函数的输出值确定目标函数的输出值。The third determining unit 1904 is configured to determine the output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
可选地,电子设备还可以包括打乱单元1905,用于打乱第二输入序列的排序,得到第一输入序列。Optionally, the electronic device may further include a shuffling unit 1905, configured to shuffle the order of the second input sequence to obtain the first input sequence.
本实施例中,电子设备中各单元所执行的操作与前述图13至图18所示实施例中描述的类似,此处不再赘述。In this embodiment, the operations performed by each unit in the electronic device are similar to those described in the foregoing embodiment shown in FIG. 13 to FIG. 18 , and will not be repeated here.
本实施例中,可以对不满足分解条件的目标函数进行近似布尔函数的分解得到至少两个查找表,从而第二确定单元1903与第三确定单元1904通过至少两个查找表求解出目标函数的输出值(也可以理解为是近似值)。In this embodiment, at least two lookup tables can be obtained by decomposing an approximate Boolean function on an objective function that does not meet the decomposition conditions, so that the second determination unit 1903 and the third determination unit 1904 can solve the objective function through at least two lookup tables Output value (also can be understood as an approximate value).
参阅图20,本申请提供的另一种电子设备的结构示意图。该电子设备可以包括处理器2001、存储器2002和通信接口2003。该处理器2001、存储器2002和通信接口2003通过线路互联。其中,存储器2002中存储有程序指令和数据。Referring to FIG. 20 , it is a schematic structural diagram of another electronic device provided by the present application. The electronic device may include a processor 2001 , a memory 2002 and a communication interface 2003 . The processor 2001, the memory 2002 and the communication interface 2003 are interconnected through lines. Wherein, program instructions and data are stored in the memory 2002 .
存储器2002中存储了前述图13至图18所示对应的实施方式中,由电子设备执行的步骤对应的程序指令以及数据。The memory 2002 stores program instructions and data corresponding to the steps executed by the electronic device in the corresponding embodiments shown in FIGS. 13 to 18 .
处理器2001,用于执行前述图13至图18所示实施例中任一实施例所示的由电子设备执行的步骤。The processor 2001 is configured to execute the steps performed by the electronic device shown in any one of the above embodiments shown in FIG. 13 to FIG. 18 .
通信接口2003可以用于进行数据的接收和发送,用于执行前述图13至图18所示实施例中任一实施例中与获取、发送、接收相关的步骤。The communication interface 2003 may be used for receiving and sending data, and for performing steps related to acquiring, sending, and receiving in any of the embodiments shown in FIGS. 13 to 18 .
一种实现方式中,电子设备可以包括相对于图20更多或更少的部件,本申请对此仅仅是示例性说明,并不作限定。In an implementation manner, the electronic device may include more or fewer components than those shown in FIG. 20 , which is only an example in the present application and not limited thereto.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units may be fully or partially realized by software, hardware, firmware or any combination thereof.
当使用软件实现所述集成的单元时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。When the integrated units are implemented using software, they may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a description of the manner in which objects with the same attribute are described in the embodiments of the present application. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product, or apparatus comprising a series of elements is not necessarily limited to those elements, but may include elements not expressly included. Other elements listed explicitly or inherent to the process, method, product, or apparatus.

Claims (21)

  1. 一种多级查找表电路,其特征在于,所述电路用于基于多个查找表求解目标函数的输出值,所述多个查找表包括第一查找表与第二查找表,所述目标函数的第一输入序列包括第一子集与第二子集;所述电路包括第一模块与第二模块;A multi-level lookup table circuit, characterized in that the circuit is used to solve the output value of an objective function based on a plurality of lookup tables, the plurality of lookup tables comprising a first lookup table and a second lookup table, the objective function The first input sequence includes a first subset and a second subset; the circuit includes a first module and a second module;
    所述第一模块,用于基于所述第一子集与所述第一查找表确定第一函数的输出值,所述第一函数为所述目标函数中的嵌套函数;The first module is configured to determine an output value of a first function based on the first subset and the first lookup table, the first function being a nested function in the objective function;
    所述第二模块,用于基于所述第二子集、所述第二查找表以及所述第一函数的输出值确定所述目标函数的输出值。The second module is configured to determine an output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
  2. 根据权利要求1所述的电路,其特征在于,所述电路还包括打乱模块;The circuit according to claim 1, wherein the circuit also includes a scrambling module;
    所述打乱模块,用于获取所述目标函数的第二输入序列,打乱所述第二输入序列的排序,得到所述第一输入序列,并分解所述第一输入序列得到所述第一子集与所述第二子集;The scrambling module is configured to obtain the second input sequence of the objective function, scramble the sorting of the second input sequence to obtain the first input sequence, and decompose the first input sequence to obtain the first input sequence. a subset and said second subset;
    所述打乱模块,还用于向所述第一模块发送所述第一子集,以及向所述第二模块发送所述第二子集。The scrambling module is further configured to send the first subset to the first module, and send the second subset to the second module.
  3. 根据权利要求1所述的电路,其特征在于,所述电路还包括配置模块;The circuit according to claim 1, further comprising a configuration module;
    所述配置模块,用于将所述目标函数对应的真值表进行近似处理,得到近似后的真值表,并将近似后的真值表分解为所述第一查找表与所述第二查找表;The configuration module is configured to approximate the truth table corresponding to the objective function to obtain an approximated truth table, and decompose the approximated truth table into the first lookup table and the second lookup table. lookup table;
    所述配置模块,还用于向所述第一模块发送所述第一查找表,以及向所述第二模块发送所述第二查找表。The configuration module is further configured to send the first lookup table to the first module, and send the second lookup table to the second module.
  4. 根据权利要求1至3中任一项所述的电路,其特征在于,所述目标函数的表达式如下:The circuit according to any one of claims 1 to 3, wherein the expression of the objective function is as follows:
    f(x)=F(Φ(B),A);f(x)=F(Φ(B),A);
    其中,f(x)为所述目标函数,F(Φ(B),A)为所述目标函数或者近似后的目标函数,B为所述第一子集,A为所述第二子集,Φ(B)为所述第一函数。Wherein, f(x) is the objective function, F(Φ(B), A) is the objective function or an approximated objective function, B is the first subset, and A is the second subset , Φ(B) is the first function.
  5. 根据权利要求1至4中任一项所述的电路,其特征在于,所述目标函数为不满足分解条件的函数,所述分解条件为布尔函数对应真值表的分解条件。The circuit according to any one of claims 1 to 4, wherein the objective function is a function that does not satisfy a decomposition condition, and the decomposition condition is a decomposition condition of a Boolean function corresponding to a truth table.
  6. 根据权利要求1至4中任一项所述的电路,其特征在于,所述真值表的分解条件包括以下至少一项,所述真值表的行为所述第二子集,列为所述第一子集:The circuit according to any one of claims 1 to 4, wherein the decomposition conditions of the truth table include at least one of the following, the second subset of behaviors of the truth table is listed as the Describe the first subset:
    所述真值表的行中所有元素为0;All elements in the rows of the truth table are 0;
    所述真值表的行中所有元素为1;All elements in the rows of the truth table are 1;
    所述真值表的行为包含0与1的特征向量;The behavior of the truth table includes eigenvectors of 0 and 1;
    所述真值表的行为所述特征向量逐位取反得到的向量。The behavior of the truth table is a vector obtained by inverting the feature vector bit by bit.
  7. 一种函数求解方法,其特征在于,所述方法应用于查找表场景,所述方法包括:A method for solving a function, characterized in that the method is applied to a lookup table scene, and the method includes:
    获取目标函数的第一输入序列,所述第一输入序列包括至少两个子集,所述至少两个子集包括第一子集与第二子集,所述目标函数为不满足分解条件的函数,所述分解条件为布尔函数对应真值表的分解条件;Obtaining a first input sequence of an objective function, the first input sequence includes at least two subsets, the at least two subsets include a first subset and a second subset, and the objective function is a function that does not satisfy a decomposition condition, The decomposition condition is a decomposition condition corresponding to a truth table of a Boolean function;
    基于所述第一输入序列与所述分解条件确定所述目标函数的第一查找表与第二查找表,所述第一查找表与所述第一子集相关,所述第二查找表与所述第二子集相关;determining a first lookup table and a second lookup table of the objective function based on the first input sequence and the decomposition condition, the first lookup table is related to the first subset, and the second lookup table is related to the first subset said second subset is associated;
    基于所述第一子集与所述第一查找表确定第一函数的输出值,所述第一函数为所述目标函数中的嵌套函数;determining an output value of a first function based on the first subset and the first lookup table, the first function being a nested function in the objective function;
    基于所述第二子集、所述第二查找表以及所述第一函数的输出值确定所述目标函数的输出值。An output value of the objective function is determined based on the second subset, the second lookup table, and the output value of the first function.
  8. 根据权利要求7所述的方法,其特征在于,所述获取目标函数的第一输入序列之前,所述方法还包括:The method according to claim 7, wherein, before obtaining the first input sequence of the objective function, the method further comprises:
    获取所述目标函数的第二输入序列;obtaining a second input sequence of the objective function;
    打乱所述第二输入序列的排序,得到所述第一输入序列。Shuffle the order of the second input sequence to obtain the first input sequence.
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method according to claim 8, characterized in that the method further comprises:
    打乱所述第二输入序列的排序,得到第三输入序列;Shuffle the sorting of the second input sequence to obtain a third input sequence;
    确定第一误差小于第二误差,所述第一误差为基于所述第一输入序列得到的输出值与所述目标函数的实际输出之间的误差,所述第二误差为基于所述第三输入序列得到的输出值与所述实际输出之间的误差。determining that the first error is smaller than a second error, the first error is an error between the output value obtained based on the first input sequence and the actual output of the objective function, and the second error is based on the third The error between the output value obtained by the input sequence and the actual output.
  10. 根据权利要求7至9中任一项所述的方法,其特征在于,所述目标函数的表达式如下:The method according to any one of claims 7 to 9, wherein the expression of the objective function is as follows:
    f(x)≈F(Φ(B),A);f(x)≈F(Φ(B),A);
    其中,f(x)为所述目标函数,F(Φ(B),A)为近似后的目标函数,B为所述第一子集,A为所述第二子集,Φ(B)为所述第一函数。Wherein, f(x) is the objective function, F(Φ(B), A) is the approximated objective function, B is the first subset, A is the second subset, Φ(B) for the first function.
  11. 根据权利要求7至10中任一项所述的方法,其特征在于,所述基于所述第一输入序列与布尔函数的分解条件确定所述目标函数的第一查找表与第二查找表,包括:The method according to any one of claims 7 to 10, wherein the first lookup table and the second lookup table of the objective function are determined based on the decomposition conditions of the first input sequence and the Boolean function, include:
    基于所述第一输入序列与所述分解条件对所述目标函数的真值表进行近似处理,得到近似后的真值表;performing approximate processing on the truth table of the objective function based on the first input sequence and the decomposition condition to obtain an approximated truth table;
    分解所述近似后的真值表得到所述第一查找表与所述第二查找表。Decomposing the approximated truth table to obtain the first lookup table and the second lookup table.
  12. 根据权利要求7至11中任一项所述的方法,其特征在于,所述真值表的分解条件包括以下至少一项,所述真值表的行为所述第二子集,列为所述第一子集:The method according to any one of claims 7 to 11, wherein the decomposition conditions of the truth table include at least one of the following, the second subset of behaviors of the truth table is listed as the Describe the first subset:
    所述每个真值表的行中所有元素为0;All elements in the row of each truth table are 0;
    所述每个真值表的行中所有元素为1;All elements in the row of each truth table are 1;
    所述每个真值表的行为包含0与1的特征向量;The behavior of each of the truth tables includes eigenvectors of 0 and 1;
    所述每个真值表的行为所述特征向量逐位取反得到的向量。The behavior of each truth table is a vector obtained by inverting the feature vector bit by bit.
  13. 一种电子设备,其特征在于,所述电子设备应用于查找表场景,所述电子设备包括:An electronic device, characterized in that the electronic device is applied to a lookup table scenario, and the electronic device includes:
    获取单元,用于获取目标函数的第一输入序列,所述第一输入序列包括至少两个子集,所述至少两个子集包括第一子集与第二子集,所述目标函数为不满足分解条件的函数,所述分解条件为布尔函数对应真值表的分解条件;An acquisition unit, configured to acquire a first input sequence of an objective function, the first input sequence includes at least two subsets, the at least two subsets include a first subset and a second subset, and the objective function does not satisfy The function of decomposition condition, described decomposition condition is the decomposition condition of Boolean function corresponding truth table;
    第一确定单元,用于基于所述第一输入序列与所述分解条件确定所述目标函数的第一查找表与第二查找表,所述第一查找表与所述第一子集相关,所述第二查找表与所述第二子集相关;a first determining unit, configured to determine a first lookup table and a second lookup table of the objective function based on the first input sequence and the decomposition condition, the first lookup table is related to the first subset, the second lookup table is associated with the second subset;
    第二确定单元,用于基于所述第一子集与所述第一查找表确定第一函数的输出值,所述第一函数为所述目标函数中的嵌套函数;a second determining unit, configured to determine an output value of a first function based on the first subset and the first lookup table, the first function being a nested function in the objective function;
    第三确定单元,用于基于所述第二子集、所述第二查找表以及所述第一函数的输出值确定所述目标函数的输出值。A third determining unit, configured to determine an output value of the objective function based on the second subset, the second lookup table, and the output value of the first function.
  14. 根据权利要求13所述的电子设备,其特征在于,所述获取单元,还用于获取所述目 标函数的第二输入序列;The electronic device according to claim 13, wherein the acquisition unit is also used to acquire the second input sequence of the objective function;
    所述电子设备还包括:The electronic equipment also includes:
    打乱单元,用于打乱所述第二输入序列的排序,得到所述第一输入序列。A shuffling unit, configured to shuffle the sorting of the second input sequence to obtain the first input sequence.
  15. 根据权利要求14所述的电子设备,其特征在于,所述打乱单元,还用于打乱所述第二输入序列的排序,得到第三输入序列;The electronic device according to claim 14, wherein the scrambling unit is further configured to scramble the sorting of the second input sequence to obtain a third input sequence;
    所述打乱单元,具体用于确定第一误差小于第二误差,所述第一误差为基于所述第一输入序列得到的输出值与所述目标函数的实际输出之间的误差,所述第二误差为基于所述第三输入序列得到的输出值与所述实际输出之间的误差。The scrambling unit is specifically configured to determine that the first error is smaller than the second error, the first error is an error between the output value obtained based on the first input sequence and the actual output of the objective function, the The second error is an error between the output value obtained based on the third input sequence and the actual output.
  16. 根据权利要求13至15中任一项所述的电子设备,其特征在于,所述目标函数的表达式如下:The electronic device according to any one of claims 13 to 15, wherein the expression of the objective function is as follows:
    f(x)≈F(Φ(B),A);f(x)≈F(Φ(B),A);
    其中,f(x)为所述目标函数,F(Φ(B),A)为近似后的目标函数,B为所述第一子集,A为所述第二子集,Φ(B)为所述第一函数。Wherein, f(x) is the objective function, F(Φ(B), A) is the approximated objective function, B is the first subset, A is the second subset, Φ(B) for the first function.
  17. 根据权利要求13至16中任一项所述的电子设备,其特征在于,所述第一确定单元,具体用于基于所述第一输入序列与所述分解条件对所述目标函数的真值表进行近似处理,得到近似后的真值表;The electronic device according to any one of claims 13 to 16, wherein the first determining unit is specifically configured to evaluate the truth value of the objective function based on the first input sequence and the decomposition condition The table is approximated to obtain the approximated truth table;
    所述第一确定单元,具体用于分解所述近似后的真值表得到所述第一查找表与所述第二查找表。The first determining unit is specifically configured to decompose the approximated truth table to obtain the first lookup table and the second lookup table.
  18. 根据权利要求13至17中任一项所述的电子设备,其特征在于,所述真值表的分解条件包括以下至少一项,所述真值表的行为所述第二子集,列为所述第一子集:The electronic device according to any one of claims 13 to 17, wherein the decomposition conditions of the truth table include at least one of the following, the second subset of behaviors of the truth table is listed as The first subset:
    所述每个真值表的行中所有元素为0;All elements in the row of each truth table are 0;
    所述每个真值表的行中所有元素为1;All elements in the row of each truth table are 1;
    所述每个真值表的行为包含0与1的特征向量;The behavior of each of the truth tables includes eigenvectors of 0 and 1;
    所述每个真值表的行为所述特征向量逐位取反得到的向量。The behavior of each truth table is a vector obtained by inverting the feature vector bit by bit.
  19. 一种电子设备,其特征在于,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得所述电子设备执行如权利要求7-12所述的方法。An electronic device, characterized in that it includes: a processor, the processor is coupled with a memory, and the memory is used to store a program or an instruction, and when the program or instruction is executed by the processor, the electronic The device executes the method according to claims 7-12.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,所述指令在计算机上执行时,使得所述计算机执行如权利要求7至12中任一项所述的方法。A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer executes the computer according to any one of claims 7 to 12. Methods.
  21. 一种计算机程序产品,其特征在于,所述计算机程序产品在计算机上执行时,使得所述计算机执行如权利要求7至12中任一项所述的方法。A computer program product, characterized in that, when the computer program product is executed on a computer, the computer is made to execute the method according to any one of claims 7 to 12.
PCT/CN2022/128135 2021-11-01 2022-10-28 Multi-level lookup table circuit, function solving method and related device WO2023072226A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111283778.4A CN116070556A (en) 2021-11-01 2021-11-01 Multi-stage lookup table circuit, function solving method and related equipment
CN202111283778.4 2021-11-01

Publications (1)

Publication Number Publication Date
WO2023072226A1 true WO2023072226A1 (en) 2023-05-04

Family

ID=86159256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128135 WO2023072226A1 (en) 2021-11-01 2022-10-28 Multi-level lookup table circuit, function solving method and related device

Country Status (2)

Country Link
CN (1) CN116070556A (en)
WO (1) WO2023072226A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117634383A (en) * 2023-12-26 2024-03-01 苏州异格技术有限公司 Critical path delay optimization method, device, computer equipment and storage medium
CN117631752B (en) * 2024-01-25 2024-05-07 深圳市鼎阳科技股份有限公司 Waveform sequence creation method, display method and waveform sequence generator

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5224064A (en) * 1991-07-11 1993-06-29 Honeywell Inc. Transcendental function approximation apparatus and method
JP2004258799A (en) * 2003-02-24 2004-09-16 Kitakyushu Foundation For The Advancement Of Industry Science & Technology Lookup table cascade logic circuit
US20050273481A1 (en) * 2004-06-04 2005-12-08 Telefonaktiebolaget Lm Ericsson Pipelined real or complex ALU
CN102236539A (en) * 2011-07-19 2011-11-09 长沙景嘉微电子有限公司 Realization of exponentiation algorithm in designing of graphic chip
CN105573178A (en) * 2014-10-08 2016-05-11 中国科学院电子学研究所 Adaptive lookup table module with internal feedback

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5224064A (en) * 1991-07-11 1993-06-29 Honeywell Inc. Transcendental function approximation apparatus and method
JP2004258799A (en) * 2003-02-24 2004-09-16 Kitakyushu Foundation For The Advancement Of Industry Science & Technology Lookup table cascade logic circuit
US20050273481A1 (en) * 2004-06-04 2005-12-08 Telefonaktiebolaget Lm Ericsson Pipelined real or complex ALU
CN102236539A (en) * 2011-07-19 2011-11-09 长沙景嘉微电子有限公司 Realization of exponentiation algorithm in designing of graphic chip
CN105573178A (en) * 2014-10-08 2016-05-11 中国科学院电子学研究所 Adaptive lookup table module with internal feedback

Also Published As

Publication number Publication date
CN116070556A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
WO2023072226A1 (en) Multi-level lookup table circuit, function solving method and related device
US10096134B2 (en) Data compaction and memory bandwidth reduction for sparse neural networks
Xia et al. Switched by input: Power efficient structure for RRAM-based convolutional neural network
Lin et al. Learning the sparsity for ReRAM: Mapping and pruning sparse neural network for ReRAM based accelerator
US11580376B2 (en) Electronic apparatus and method for optimizing trained model
Sim et al. Scalable stochastic-computing accelerator for convolutional neural networks
CN112308204A (en) Automated neural network generation using fitness estimation
US20230196202A1 (en) System and method for automatic building of learning machines using learning machines
US11372929B2 (en) Sorting an array consisting of a large number of elements
Imani et al. NVQuery: Efficient query processing in nonvolatile memory
WO2022057433A1 (en) Machine learning model training method and related device
WO2020224035A1 (en) Digital integrated circuit layout method based on discrete optimization and terminal device
He et al. Stratification and enumeration of boolean functions by canalizing depth
JP2022530447A (en) Chinese word division method based on deep learning, equipment, storage media and computer equipment
US20210326756A1 (en) Methods of providing trained hyperdimensional machine learning models having classes with reduced elements and related computing systems
CN110110852B (en) Method for transplanting deep learning network to FPAG platform
WO2021253938A1 (en) Neural network training method and apparatus, and video recognition method and apparatus
CN114358319A (en) Machine learning framework-based classification method and related device
Qureshi et al. Sparse-PE: A performance-efficient processing engine core for sparse convolutional neural networks
CN111291792B (en) Flow data type integrated classification method and device based on double evolution
KR20230032748A (en) Apparatus and method for accelerating deep neural network learning for deep reinforcement learning
CN114730295A (en) Mode-based cache block compression
Yan et al. FLASH: FPGA locality-aware sensitive hash for nearest neighbor search and clustering application
TWI826040B (en) Quantum circuit design method and apparatus
US20220067494A1 (en) Accelerating device, data storing device, data processing system and operating method of accelerating device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22886101

Country of ref document: EP

Kind code of ref document: A1