CN117874241A - Text classification method and system based on DRAM-PIM table look-up type neural network reasoning and tuning - Google Patents

Text classification method and system based on DRAM-PIM table look-up type neural network reasoning and tuning Download PDF

Info

Publication number
CN117874241A
CN117874241A CN202410278591.2A CN202410278591A CN117874241A CN 117874241 A CN117874241 A CN 117874241A CN 202410278591 A CN202410278591 A CN 202410278591A CN 117874241 A CN117874241 A CN 117874241A
Authority
CN
China
Prior art keywords
lut
pim
neural network
dram
reasoning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410278591.2A
Other languages
Chinese (zh)
Other versions
CN117874241B (en
Inventor
孙广宇
李聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202410278591.2A priority Critical patent/CN117874241B/en
Publication of CN117874241A publication Critical patent/CN117874241A/en
Application granted granted Critical
Publication of CN117874241B publication Critical patent/CN117874241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Devices For Executing Special Programs (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text classification method and a system based on DRAM-PIM table look-up neural network reasoning and tuning, comprising a host processor, a controller and a memory-based PIM calculation module; based on a DRAM in-memory computing architecture, the efficient reasoning of the algorithm is realized by designing an operator of a look-up table type neural network LUT-NN algorithm, and the optimal data flow parameters of the look-up table type neural network in different scene deployment are further obtained by an automatic tuning algorithm, so that the efficient tuning of the reasoning parameters is realized; and obtaining a text classification result through neural network reasoning. By adopting the technical scheme of the invention, the capability of the hardware platform for executing the text classification task based on the LUT-NN can be exerted, and the compatibility of different text classification scenes is improved.

Description

Text classification method and system based on DRAM-PIM table look-up type neural network reasoning and tuning
Technical Field
The invention relates to a text processing technology of table look-up type neural network reasoning, in particular to a text classification method and a text classification system of table look-up type neural network reasoning and tuning based on DRAM-PIM.
Background
The demand of text processing on the calculated amount is great, and the look-up table type neural network (LUT-based Neural Network, LUT-NN) algorithm has the characteristics of small parameter scale, small calculated amount, simple calculation operation and the like, so that the method has been widely paid attention to the industry and academia in recent years, and is applied to natural language processing tasks such as text classification and the like. The LUT-NN algorithm replaces the full connection layer in the language model based on the transducer architecture with the LUT-NN layer, and the replacement mode is as follows: the input matrices of the original fully connected layer are clustered into a few center vectors and the pre-computed products of these center vectors and the weight matrices of the original fully connected layer are saved into a Look-up Table (LUT). During reasoning, the LUT-NN layer firstly searches the index of the nearest center vector of the current input, and then uses the index to inquire the LUT of the current layer and accumulates, so that a final result is obtained.
The LUT-NN algorithm reduces the computational demand of the model in text processing by converting matrix multiplication operation in the full connection layer into LUT query operation in the LUT-NN layer. However, existing Central Processing Unit (CPU) -centric computing architectures have low reasoning performance for LUT-NN algorithms. The main reason is that: the frequent access of LUTs in each LUT-NN layer during reasoning enables the reasoning process to be characterized by intensive access. Experiments have shown that LUT-NN is located in a memory intensive area of the roof line model (rooline model) of a CPU-centric computing platform at the time of reasoning. This memory intensive nature results in the LUT-NN not being able to fully utilize the computing resources provided by the CPU-centric computing platform, making it difficult to effectively reduce the computational demands of the model for text classification.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a text classification method and a text classification system for LUT-NN reasoning and optimizing based on a DRAM (dynamic random access memory ) in-memory computing (Processing-in-MemoryArchitecture, PIM) architecture (DRAM-based Processing-in-Memory Architecture, DRAM-PIM). On the basis of fully considering the architecture characteristics of the DRAM-PIM, the method makes full use of hardware resources to accelerate the reasoning process of the LUT-NN by customizing the operator design of the LUT-NN on the DRAM-PIM architecture. In addition, the configuration change of the DRAM-PIM and the LUT-NN in different scenes is considered, the variable parameters in the operators are further extracted, and the automatic tuning algorithm is customized for the variable parameters, so that the optimal data flow parameters of the LUT-NN in different scene deployment are obtained, the deployment efficiency of the method in different scenes is improved, and the text classification performance is improved.
The technical scheme provided by the invention is as follows:
a table look-up type neural network reasoning and optimizing text classification method based on DRAM-PIM. The method realizes the efficient reasoning of the LUT-NN algorithm by customizing the operator design of the LUT-NN algorithm. On the basis, the method provides a set of automatic tuning algorithm, thereby realizing the efficient tuning of the reasoning parameters; and obtaining a text classification result through neural network reasoning. The method comprises the following steps:
1) Designing a function interface based on a host processor programming framework and a PIM module programming framework; the user provides a network configuration of the look-up table neural network. According to the network configuration, the programming framework and the function interface designed by the invention are used for compiling neural network reasoning codes according to the LUT-NN network configuration defined by a user.
The invention designs a function interface based on a host processor programming framework and a PIM module programming framework, wherein the function interface comprises various operators required by a table look-up type neural network and comprises the following components: nearest center query operator, LUT query operator, and other operators;
2) The user provides a hardware configuration of the DRAM-PIM computing platform that is used inferentially. For the above configuration and the network configuration of the lookup neural network provided by the user, the method designs and uses an inference parameter design space exploration algorithm to find the optimal inference parameter, and injects the optimal inference parameter into a compiler for compiling an inference program.
3) The method converts the program written by the user into an executable binary file through a compiler. After compiling, the executable binary file is sent to a text classification system based on DRAM-PIM table look-up neural network reasoning and tuning, and waits for executing text classification tasks. The classification system includes a host processor (module), a controller (module), and an in-memory computing (Processing in Memory, PIM) module.
The host processor is used for executing operation and controlling all PIM modules. The host processor includes a plurality of controllers for interaction of data and instructions between the host processor and the PIM module. Each controller is coupled to a plurality of PIM modules. PIM modules connected to the same controller share a data path with the host processor. Each PIM module contains a plurality of computing nodes that share a data path to a host processor. The local memory of each compute node contains one or more DRAM arrays. Each computing node comprises an arithmetic unit.
4) The invention uses the system to execute text classification task by the following steps:
4.1 The system receives text input appointed by a user, loads the binary file obtained in the step 3), and loads LUT-NN weight parameters provided by the user according to parameter information in the binary file.
4.2 After the loading is finished, the program starts to run. The system will execute all operators in turn according to the user-defined model structure. For each operator called by the user, the system automatically judges the hardware required by execution according to the category of the operator, and performs hardware call. The system can automatically perform data transmission between different hardware at the same time.
4.3 After all operators are executed, the user obtains a text classification result obtained by neural network reasoning.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a text classification method and a system for LUT-NN reasoning and tuning based on a DRAM (dynamic random Access memory) Processing-in-Memory Architecture, which have the technical advantages that:
compared with the traditional neural network reasoning method, the LUT-NN reasoning and optimizing method based on the DRAM in-memory computing architecture can achieve improvement of throughput and energy efficiency;
the optimization method fully considers the affinities between different hardware in the DRAM-PIM hardware system and the LUT-NN inference operator, and can fully exert the capability of the hardware platform to execute the LUT-NN-based text classification task.
And thirdly, the reasoning parameter design space exploration algorithm improves the convenience of the method when the method is transplanted under different DRAM-PIM platforms and LUT-NN configurations, and improves the compatibility of different text classification scenes.
Drawings
FIG. 1 is a schematic diagram of a DRAM-PIM architecture;
wherein 1-a host processor; 2-a controller; 3-PIM module; 4-computing nodes; 5-a local memory; 6-a calculation unit.
FIG. 2 is a tensor diagram of the LUT-NN layer involved in reasoning;
wherein 1-an input matrix (tensor); 2-center vector tensor; 3-LUT index matrix (tensor); 4-LUT tensor.
Fig. 3 is a flow chart of the inference parameter design space exploration algorithm provided by the invention.
FIG. 4 is a block flow diagram of a LUT-NN inference text classification method provided by the invention.
Detailed Description
The invention is further described by way of examples in the following with reference to the accompanying drawings, but in no way limit the scope of the invention.
FIG. 1 illustrates a DRAM-PIM hardware architecture for use with the present invention. Where 1 is the host processor, which is responsible for performing the operations and controlling all PIM modules. And 2, a controller on the host processor, which is responsible for the interaction of data and instructions between the host processor and the PIM module. 3 is a in-memory computing (Processing in Memory, PIM) module. Each controller is coupled to a plurality of PIM modules. PIM modules connected to the same controller share a data path with the host processor. In each PIM module, 4 is a compute node on the PIM module. Each PIM module contains a plurality of computing nodes that share a data path to a host processor. Within each compute node, 5 is the local memory of the compute node, which contains one or more DRAM arrays. Reference numeral 6 denotes an arithmetic unit of computation nodes, each of which includes an arithmetic unit.
Fig. 2 shows a tensor diagram of the LUT-NN layer involved in reasoning. Wherein 1 refers to the input tensor of the LUT-NN layer, which is in the shape of。/>Number of lines representing input tensor, < >>Representing the length of the row vector of the input tensor. The input tensor may be a user-specified input or an output of the precursor neural network layer.
2 is the center vector tensor of the LUT-NN layer, which has the shape of。/>Number of sets of center vectors representing center vector tensors, < >>Represents the number of center vectors in each center vector group,/-, and>representing the length of each center vector. In particular, the shape of the center vector tensor and the shape of the input tensor need to satisfy: />. The center vector tensor is specified by the LUT-NN weight parameters (trained beforehand) provided by the user.
3 is the LUT index tensor of the LUT-NN layer, which has the shape of。/>Number of lines representing input tensor, < >>The number of sets of center vectors representing the center vector tensors. The LUT index tensor is an reasoning intermediate result obtained through calculation between the input tensor and the center vector tensor.
4 is the LUT tensor of the LUT-NN layer, which has the shape of。/>The number of LUT groups representing the LUT tensor (same as the number of center vector groups of the center vector tensor),>represents the number of row vectors (same as the number of center vectors of each center vector group) in each LUT group, and>representing the row vector length of the output matrix of the current layer of LUT-NN. The LUT tensor is specified by the LUT-NN weight parameters provided by the user.
The invention relates to a table look-up type neural network reasoning and tuning method, which comprises the following steps:
1. the present invention provides a programmer with a set of function interfaces based on a host processor programming framework and a PIM module programming framework that contain the various operators required for LUT-NN networks for performing text processing. The classification of operators is as follows:
a) Nearest-neighbor central query operator: the operator takes as input the tensor of the input text processed by the pre-neural network layer and the center vector tensor in the LUT-NN weight parameters provided by the user. The operator first divides all line vectors in the input tensor into lengthThen calculates the L2 distance between each sub-vector and all the center vectors in the corresponding center vector group (vector +.>The L2 distance between them is defined as +.>) And finally, acquiring the intra-group index of the center vector with the shortest distance. All sub-vectors have an index structure of +.>As output of the operator. This operator is implemented using a host processor programming framework in DRAM-PIM and is responsible for execution by the host processor.
b) LUT query operator: the operator takes as inputs the LUT index tensor and the LUT tensor of the LUT-NN layer. This operator is implemented using the PIM module programming framework in DRAM-PIM and is responsible for execution by all PIM modules. In specific implementation, the LUT query operator includes the following execution steps:
i. tensor LUT indexDimension division into +.>Parts, get->A plurality of LUT index slices; let LUT tensor->Dimension division into +.>Parts, get->And (5) slicing the LUT. Assume that +.>The number of the calculation nodes is as follows:
ii. will allComputing nodes on PIM module are divided intoGroups, each group comprising->And computing nodes. Host processor will->The LUT index slice (+)>) Broadcast to->Group (/ ->) And will +.>Slice of LUT (+)>) Broadcast to the +.>Individual computing nodes (+)>)。
After the data transmission is completed, each computing node starts to calculate the assigned task. Each compute node slices the LUT index alongTwo dimensions are divided into +.>Part and->Parts, get->Slicing the slices; and slice LUT along +.>Dimension division into +.>Parts, get->Sub-slices. When executing, the compute node loads each LUT index subslice in turn. For the current LUT index sub-slice, the computing node sequentially loads each LUT sub-slice, reads the corresponding row vector according to the index value in the current LUT index sub-slice, and accumulates the row vector on the corresponding result row. This step continues until all LUT index subslices have been traversed.
iv, after the calculation is completed, the host processor obtains the calculation result
c) Other operators: the method comprises the steps of constructing a non-LUT-NN layer operator required by a complete LUT-NN, wherein the non-LUT-NN layer operator comprises an element-by-element operator, a Softmax operator, an activation function operator, a regularization operator, a word segmentation operator, an embedded vector conversion operator and a language model head operator. These operators are implemented using a host processor programming framework in a DRAM-PIM architecture and are responsible for execution by the host processor.
2. The user defines the network structure of the LUT-NN using the function interface described above. After obtaining the network structure and the DRAM-PIM hardware parameters used by the user, the invention selects the optimal parameter value for each LUT query operator through the reasoning parameter design space exploration algorithm (the complete parameter list of each operator comprises:、/>、/>、/>、/>). The flow of the algorithm is shown in fig. 3, and specifically comprises the following steps:
a) The user provides the LUT-NN network configuration and the configuration of the DRAM-PIM hardware platform. Wherein the LUT-NN network configuration comprises a network structure diagram of the LUT-NN and shape parameters of each LUT-NN layer、/>、/>、/>). Wherein (1)>Number of lines representing input tensor, < >>Number of sets of center vectors representing center vector tensors, < >>Represents the number of center vectors in each center vector group,/-, and>representing the row vector length of the output matrix of the current layer of LUT-NN as shown in fig. 2. The configuration of the DRAM-PIM hardware platform includes the total number of PIM modules, the total number of computing nodes on each PIM module, the computing power of the computing units within each computing node, and the total bandwidth of data transfer between the PIM modules and the host processor. As shown in FIG. 1, the total number of PIM modules is 3 in the DRAM-PIM architecture, the total number of computing nodes on each PIM module is 4 in the DRAM-PIM architecture, the computing power of the computing units in each computing node is 5 in the DRAM-PIM architecture, and the total data transmission between the PIM module and the host processor isThe bandwidth is the total bandwidth of the data transfer between 2-3 in the DRAM-PIM architecture.
b) After the configuration is given, the algorithm traverses all LUT query operators to test and tune. If LUT query operators which are not optimized in the steps b) -f) still exist, selecting one operator by the algorithm, initializing the optimal parameter record to be empty, and initializing the lowest reasoning overhead record to be(maximum value representable by unsigned integer) and then jumps to step c). Otherwise, jumping to step g).
c) If the current LUT query operator still exists that is not tested by steps c) -e)Parameter pairs, wherein the algorithm selects one of the parameter pairs and estimates the total data transmission overhead of the current parameter pair>The estimation method comprises the following steps:. Wherein (1)>Transmission overhead for LUT index tensor, +.>Transmission overhead for LUT tensor, +.>The resulting transmission overhead is output for the operator. For tensor->(/>),/>The estimation method of (1) is as follows: />. Wherein (1)>Intra-node tensor +.>Slice size of>To calculate the total number of nodes>Transmitting tensors for host processor>Bandwidth at that time. After the estimation is completed, the algorithm jumps to step d). If all ofAnd f) after the parameter pairs are all estimated, the algorithm jumps to step f).
d) For the currentParameter pairs, if there are still untested +.>The algorithm selects one of the parameter pairs and estimates the calculation overhead +.>The estimation method comprises the following steps:. Wherein (1)>Sub-slice number indexed for LUT on a single compute node, < >>Sub-slicing for LUT on a single compute nodeQuantity of->The operation delay between the sub-slice and the single LUT sub-slice is indexed for the single LUT. After the estimation is completed, the algorithm jumps to step e). If all->And c, after the parameter pairs are all estimated, the algorithm jumps to the step c).
e) For the currentComplete parameter pair, if the current total cost +.>And updating the current lowest inference overhead record by using the current total overhead by using the algorithm, and updating the optimal parameter record by using the current complete parameter pair. Otherwise, the algorithm still keeps the current lowest reasoning overhead record and the optimal parameter record. After the updating is finished, the algorithm jumps to step d).
f) And c, completing tuning of the current LUT query operator, saving the optimal parameter record of the current operator by the algorithm, and jumping to the step b).
g) And (3) ending the inference parameter design space exploration algorithm, and storing the optimal parameter records of all LUT operators for subsequent execution.
3. After the optimal parameters for all LUT query operators are obtained, these parameters will be injected into the compiler. The user-defined LUT-NN network is compiled into a binary file and loaded into the host processor and PIM module. The DRAM-PIM computing system then loads the LUT-NN network weight parameters provided by the user. After loading is completed, the system starts to execute LUT-NN reasoning. When the DRAM-PIM computing system performs LUT-NN reasoning, the present invention follows a flow as shown in FIG. 4, which specifically comprises the steps of:
a) The user provides text input to be processed to the DRAM-PIM computing system. After the system receives text input, execution of operators within the LUT-NN network begins.
b) If there are still unexecuted operators, the host processor initializes the input of the current operator. If an operator is the start operator, the host processor uses the user-specified text input as the input for the current operator. Otherwise, the host processor uses the output of the precursor operator as the input of the current operator according to the connection relation among the operators. After the input initialization is completed, the process jumps to step c). If all operators are executed, jumping to step e).
c) And if the current operator is the LUT query operator, the PIM module executes calculation of the current operator. Otherwise, the host processor performs the computation of the current operator.
d) After the calculation of the current operator is completed, the host processor stores the output of the current operator. After the preservation is completed, the process jumps to step b).
e) And ending the reasoning flow, and obtaining a text classification result by the user.
It should be noted that the purpose of the disclosed embodiments is to aid further understanding of the present invention, but those skilled in the art will appreciate that: various alternatives and modifications are possible without departing from the scope of the invention and the appended claims. Therefore, the invention should not be limited to the disclosed embodiments, but rather the scope of the invention is defined by the appended claims.

Claims (9)

1. A text classification method based on DRAM-PIM table look-up neural network reasoning and tuning is characterized in that a DRAM-PIM architecture is calculated based on a dynamic random access memory-memory, the efficient reasoning of the algorithm is realized by designing an operator of a table look-up neural network LUT-NN algorithm, and the optimal data flow parameters of the table look-up neural network in different scene deployment are further obtained by an automatic tuning algorithm, so that the efficient tuning of reasoning parameters is realized; obtaining a text classification result through neural network reasoning; the method comprises the following steps:
1) Designing a function interface based on a host processor programming framework and a PIM module programming framework; the function interface comprises a nearest center query operator and an LUT query operator of the lookup type neural network; generating a table look-up type neural network reasoning code according to the function interface and the network configuration of the table look-up type neural network; the nearest neighbor center query operator is used for acquiring the LUT index tensor;
2) Designing an inference parameter design space exploration algorithm, searching to obtain an optimal inference parameter value according to the hardware configuration of the DRAM-PIM computing platform and the network configuration of the table look-up type neural network, and injecting the optimal inference parameter value into a compiler for compiling an inference program;
the inference parameter design space exploration algorithm comprises the following steps:
21 Acquiring network configuration of a look-up table type neural network and hardware configuration of a DRAM-PIM computing platform;
22 Traversing all LUT query operators for tuning; the parameters for each operator include:、/>、/>、/>、/>the method comprises the steps of carrying out a first treatment on the surface of the Wherein,to index LUT into tensor +.>Parts obtained by dimension segmentation, < > and->To index LUT into tensor +.>Dividing the dimensions to obtain parts; />A number of rows representing an input tensor; />Number of sets of center vectors representing center vector tensors, < >>A row vector length representing an output matrix of a current layer of the LUT-NN; each computing node slices the LUT index into slices +.>Two dimensions are divided into +.>Part and->A part(s); slice LUT along +.>Dimension division into +.>Parts, get->Slicing the slices;
if the non-optimal LUT query operator exists, selecting one of the operators, initializing an optimal parameter record and a lowest reasoning overhead record, and then executing the step 23); otherwise, jumping to step 27);
23 Estimating the total data transmission overhead of the current parameter pair;
if the current LUT query operator still has untested tuningParameter pairs, choose one of them>Parameter pair and estimate the current +.>Total data transmission overhead of parameter pairs>
If all ofAfter the parameter pairs are all estimated, jumping to step 26);
24 Estimating the calculation overhead of the current parameter pair;
for the currentParameter pairs, if there are still untested +.>Parameter pairs, one of which is selected and the calculation overhead is estimated>
If all ofThe parameter pairs are all estimated, and the step is skipped to the step 23);
25 Updating the current lowest reasoning overhead record and the optimal parameter record;
for the currentComplete parameter pair, if the current total cost +.>Less than the current lowest inferred cost record, updating the current lowest inferred cost record with the current total cost,updating the optimal parameter record by using the current complete parameter pair; after the updating is finished, jumping to the step 24);
26 Completing tuning of the current LUT query operator, saving the optimal parameter record of the current operator, and jumping to the step 22);
27 Ending the algorithm to obtain the optimal parameter records of all LUT query operators;
3) Converting the user program into an executable binary file through a compiler;
4) Performing text classification tasks:
4.1 Receiving text input of a user, loading a binary file, and loading LUT-NN weight parameters provided by the user according to parameter information in the binary file;
4.2 Executing and calling all operators in sequence according to a model structure defined by a user; judging hardware required for execution according to the category of the operator for each operator to be called, and calling the hardware; simultaneously, data transmission among different hardware is automatically executed;
4.3 After all operators are executed, obtaining a text classification result obtained by neural network reasoning.
2. The text classification method based on DRAM-PIM table look-up type neural network reasoning and optimizing as set forth in claim 1, wherein the LUT-NN network configuration includes a network structure diagram of LUT-NN and shape parameters of each LUT-NN layer、/>、/>、/>) The method comprises the steps of carrying out a first treatment on the surface of the Wherein->Representing the number of rows of the input matrix; />Number of sets of center vectors representing center vector tensors, < >>Represents the number of center vectors in each center vector group,/-, and>representing the row vector length of the output matrix of the current layer of LUT-NN.
3. The method for text classification based on DRAM-PIM look-up table neural network reasoning and tuning of claim 2, wherein the configuration of the DRAM-PIM hardware platform includes a total number of PIM modules, a total number of computing nodes on each PIM module, a computing power of an arithmetic unit within each computing node, and a total bandwidth of data transmission between the PIM modules and the host processor.
4. The text classification method based on DRAM-PIM table look-up neural network reasoning and tuning as set forth in claim 2, wherein the LUT look-up operator comprises the following steps:
A. tensor LUT indexDimension division into +.>Parts, get->A plurality of LUT index slices; let LUT tensor->Dimension division into +.>Parts, get->Each LUT slice; common on all PIM modules->The number of the calculation nodes is as follows: />
B. Dividing computing nodes on all PIM modules intoGroups, each group comprising->Computing nodes; host processor will->The LUT index slice is broadcast to +.>All computing nodes of the group and will +.>The LUT slices are broadcast to the +.>Computing nodes; />,/>
C. After the data transmission is finished, each computing node starts to compute the assigned task;
each computing node slices the LUT index alongTwo dimensions are divided into +.>Part and->Parts by weight of the mixture to obtainSlicing the slices; and slice LUT along +.>Dimension division into +.>Parts, get->Slicing the slices; when executing, the computing node loads each LUT index sub-slice in turn; for the current LUT index sub-slice, the computing node loads each LUT sub-slice in turn, reads the corresponding row vector according to the index value in the current LUT index sub-slice, and accumulates the row vector on the corresponding result row;
D. traversing all LUT index sub-slices; after the calculation is completed, the host processor acquires a calculation result.
5. The text classification method based on DRAM-PIM table look-up neural network reasoning and tuning of claim 1, wherein,total data transmission overhead of parameter pairs>The estimation method of (1) is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the transmission overhead of the LUT index matrix, +.>Transmission overhead for LUT tensor, +.>The resulting transmission overhead is output for the operator.
6. The text classification method based on DRAM-PIM table look-up neural network reasoning and tuning of claim 5, wherein,the estimation method of (1) is as follows: />The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>,/>Intra-node tensor +.>Slice size of>To calculate the total number of nodes>Transmitting tensors for host processor>Bandwidth at that time.
7. The text classification method based on DRAM-PIM table look-up neural network reasoning and tuning of claim 6, wherein in step 24), the method for estimating the calculation overhead of the parameter pairs is:wherein->Sub-slice number indexed for LUT on a single compute node, < >>For the number of LUT sub-slices on a single compute node, < >>The operation delay between the sub-slice and the single LUT sub-slice is indexed for the single LUT.
8. The method for text classification based on DRAM-PIM table look-up neural network reasoning and tuning of claim 1, wherein designing the functional interface based on the host processor programming framework and PIM module programming framework further comprises other operators of the table look-up neural network, including element-by-element operators, softmax operators, activate function operators, regularizer operators, word-segmenter operators, embedded vector conversion operators, language model head operators.
9. A text classification system based on DRAM-PIM table look-up neural network reasoning and tuning, which is characterized in that the text classification system is utilized to realize the text classification method based on DRAM-PIM table look-up neural network reasoning and tuning according to claim 1; the text classification system comprises a host processor, a controller and an in-memory calculation PIM module;
the host processor is used for executing operation and controlling all in-memory calculation PIM modules; the host processor comprises a plurality of controllers for interaction of data and instructions between the host processor and the PIM module; each controller is connected with a plurality of in-memory computing PIM modules; PIM modules connected to the same controller share a data path with a host processor; each PIM module comprises a plurality of computing nodes sharing a data path to a host processor; the local memory of each compute node contains one or more DRAM arrays; each computing node comprises an arithmetic unit.
CN202410278591.2A 2024-03-12 2024-03-12 Text classification method and system based on DRAM-PIM table look-up type neural network reasoning and tuning Active CN117874241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410278591.2A CN117874241B (en) 2024-03-12 2024-03-12 Text classification method and system based on DRAM-PIM table look-up type neural network reasoning and tuning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410278591.2A CN117874241B (en) 2024-03-12 2024-03-12 Text classification method and system based on DRAM-PIM table look-up type neural network reasoning and tuning

Publications (2)

Publication Number Publication Date
CN117874241A true CN117874241A (en) 2024-04-12
CN117874241B CN117874241B (en) 2024-05-17

Family

ID=90595361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410278591.2A Active CN117874241B (en) 2024-03-12 2024-03-12 Text classification method and system based on DRAM-PIM table look-up type neural network reasoning and tuning

Country Status (1)

Country Link
CN (1) CN117874241B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190065111A1 (en) * 2017-08-31 2019-02-28 Micron Technology, Inc. Apparatuses and methods for in-memory operations
US20210357739A1 (en) * 2020-05-14 2021-11-18 Micron Technology, Inc. Memory device to train neural networks
US11354134B1 (en) * 2021-03-25 2022-06-07 Micron Technology, Inc. Processing-in-memory implementations of parsing strings against context-free grammars
US20220391128A1 (en) * 2021-06-07 2022-12-08 Intel Corporation Techniques to repurpose static random access memory rows to store a look-up-table for processor-in-memory operations
CN115982418A (en) * 2023-03-17 2023-04-18 亿铸科技(杭州)有限责任公司 Method for improving super-division operation performance of AI (Artificial Intelligence) computing chip
US20230385258A1 (en) * 2021-08-31 2023-11-30 University Of Virginia Patent Foundation Dynamic random access memory-based content-addressable memory (dram-cam) architecture for exact pattern matching

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190065111A1 (en) * 2017-08-31 2019-02-28 Micron Technology, Inc. Apparatuses and methods for in-memory operations
US20210357739A1 (en) * 2020-05-14 2021-11-18 Micron Technology, Inc. Memory device to train neural networks
US11354134B1 (en) * 2021-03-25 2022-06-07 Micron Technology, Inc. Processing-in-memory implementations of parsing strings against context-free grammars
US20220391128A1 (en) * 2021-06-07 2022-12-08 Intel Corporation Techniques to repurpose static random access memory rows to store a look-up-table for processor-in-memory operations
US20230385258A1 (en) * 2021-08-31 2023-11-30 University Of Virginia Patent Foundation Dynamic random access memory-based content-addressable memory (dram-cam) architecture for exact pattern matching
CN115982418A (en) * 2023-03-17 2023-04-18 亿铸科技(杭州)有限责任公司 Method for improving super-division operation performance of AI (Artificial Intelligence) computing chip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
温璞;杨学军;唐玉华;: "高性能并行PIM系统中Parcels通信机制研究", 小型微型计算机系统, no. 03, 21 March 2006 (2006-03-21), pages 554 - 557 *

Also Published As

Publication number Publication date
CN117874241B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
CN111242289B (en) Convolutional neural network acceleration system and method with expandable scale
US20180174036A1 (en) Hardware Accelerator for Compressed LSTM
CN108446761B (en) Neural network accelerator and data processing method
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
KR20210032266A (en) Electronic device and Method for controlling the electronic device thereof
CN112513886B (en) Information processing method, information processing apparatus, and information processing program
CN111970154B (en) Unloading decision and resource allocation method based on deep reinforcement learning and convex optimization
CN111723910A (en) Method and device for constructing multi-task learning model, electronic equipment and storage medium
CN114781632A (en) Deep neural network accelerator based on dynamic reconfigurable pulse tensor operation engine
Chen et al. Computing offloading decision based on DDPG algorithm in mobile edge computing
CN113537465A (en) LSTM model optimization method, accelerator, device and medium
CN112463189A (en) Distributed deep learning multi-step delay updating method based on communication operation sparsification
EP3926546A2 (en) Neural network model splitting method, apparatus, computer device and storage medium
CN115858173A (en) GPU memory bottleneck improvement method for large deep learning model training
CN117874241B (en) Text classification method and system based on DRAM-PIM table look-up type neural network reasoning and tuning
CN113900779A (en) Task execution method and device, electronic equipment and storage medium
CN112990461B (en) Method, device, computer equipment and storage medium for constructing neural network model
CN114897133A (en) Universal configurable Transformer hardware accelerator and implementation method thereof
CN114356738A (en) Method for predicting time required for executing neural network model and related product
CN114707636A (en) Neural network architecture searching method and device, electronic equipment and storage medium
WO2021238734A1 (en) Method for training neural network, and related device
CN114722490A (en) Agent model global optimization method based on mixed increase and interval reduction
CN114138493A (en) Edge computing power resource scheduling method based on energy consumption perception
CN111340224A (en) Accelerated design method of CNN network suitable for low-resource embedded chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant