CN117971354A

CN117971354A - Heterogeneous acceleration method, device, equipment and storage medium based on end-to-end learning

Info

Publication number: CN117971354A
Application number: CN202410372273.2A
Authority: CN
Inventors: 童浩南; 任智新; 张闯
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2024-03-29
Filing date: 2024-03-29
Publication date: 2024-05-03

Abstract

The invention provides a heterogeneous acceleration method, a heterogeneous acceleration device, heterogeneous acceleration equipment and a heterogeneous acceleration storage medium based on end-to-end learning, which relate to the technical field of computers, and are used for acquiring a data control flow through local hardware equipment, generating a non-deterministic finite automaton according to a generated regular expression, and analyzing and filtering the data control flow by the non-deterministic finite automaton; analyzing the non-deterministic finite automata through a graph convolution neural network model based on an integrated clustering module in heterogeneous equipment, and configuring the non-deterministic finite automata to a corresponding regular engine when the non-deterministic finite automata is in a matching relation with a regular expression represented by the non-deterministic finite automata, so as to perform parallel analysis and filtering on a data control flow; the graph convolution neural network model of the integrated clustering module is obtained based on the joint training of the differential clustering module and the graph convolution neural network, and whether the non-deterministic finite automaton represents a regular rule can be verified more effectively, so that more efficient regular expression matching is achieved.

Description

Heterogeneous acceleration method, device, equipment and storage medium based on end-to-end learning

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a heterogeneous acceleration method, apparatus, device, and storage medium based on end-to-end learning.

Background

At present, software solutions on a central processing unit (Central Processing Unit, CPU) and a graphics processor (graphics processing unit, GPU) are quickly limited by computation as the complexity of the expression increases, so that heterogeneous acceleration using heterogeneous hardware acceleration devices is widely used, on one hand, big data workload can be unloaded onto a hardware acceleration card, a higher speed-up ratio can be obtained, and on the other hand, part of CPU or GPU pressure can be released. The field programmable gate array (Field Programmable GATE ARRAY, FPGA) is a common heterogeneous acceleration card, is widely used in a data center, and can be used for directly compiling a regular expression into a non-deterministic finite automaton (Nondeterministic Finite Automata, NFA) by the FPGA, and the matching path can be built immediately when the non-deterministic finite automaton encounters an input character string, so that the acceleration of software application in a CPU or a GPU can be realized. However, as the complexity of the regular expression increases, the corresponding non-deterministic finite automaton may become very huge and complex, and as it cannot be verified whether the non-deterministic finite automaton accurately expresses the content of the regular expression, the generated non-deterministic finite automaton may be unreasonable due to the regular expression with high complexity, so that the matching efficiency is low when the non-deterministic finite automaton processes large-scale text data, and the heterogeneous device is affected to execute an acceleration task.

Disclosure of Invention

The invention provides a heterogeneous acceleration method, a device, equipment and a storage medium based on end-to-end learning, which are used for solving the defects that in the related technology, because whether a non-deterministic finite automaton accurately expresses the content of a regular expression or not cannot be verified, the generated non-deterministic finite automaton is unreasonable due to a high-complexity regular expression, and the matching efficiency is low when the non-deterministic finite automaton processes large-scale text data, and the heterogeneous equipment is influenced to execute an acceleration task.

The invention provides a heterogeneous acceleration method based on end-to-end learning, which comprises the following steps:

Acquiring a data control flow through a local hardware device, and generating a non-deterministic finite automaton according to a generated regular expression, wherein the non-deterministic finite automaton is used for representing the regular expression so as to analyze and filter the data control flow;

Receiving the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyzing the non-deterministic finite automaton based on a graph convolution neural network model of an integrated clustering module, and configuring the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relation with a regular expression represented by the non-deterministic finite automaton, and carrying out parallel analysis and filtering on the data control flow;

The graph convolution neural network model of the integrated clustering module is obtained based on the combined training of the differentiable clustering module and the graph convolution neural network.

According to the invention, the method for heterogeneous acceleration based on end-to-end learning is provided, and the graph convolution neural network model based on the integrated clustering module analyzes the non-deterministic finite automaton, and comprises the following steps:

converting the nondeterministic finite automaton into a state machine undirected graph;

inputting each node in the state machine undirected graph into the graph convolution neural network, and obtaining an embedded vector of each node;

Inputting the embedded vector of each node into the differentiable clustering module to obtain a plurality of regular character clusters; the graph convolution neural network and the differentiable clustering module are obtained through combined training;

Acquiring the correlation among all characters in the regular expression corresponding to the non-deterministic finite automaton;

And when the correlation between each regular character cluster and each character in the regular expression is consistent, judging that the non-deterministic finite automata is in a matching relation with the regular expression represented by the non-deterministic finite automata.

The invention provides a heterogeneous acceleration method based on end-to-end learning, which converts the nondeterministic finite automaton into a state machine undirected graph, and comprises the following steps:

acquiring a state machine topological structure of the non-deterministic finite automaton;

Taking the edge from one node to the other node in the topological structure of the state machine as an edge node in the undirected graph of the state machine;

In a state machine undirected graph, if two edge nodes are connected through a state in the state machine topology, creating an undirected edge between the two edge nodes;

a state machine undirected graph structure is generated based on the plurality of edge nodes and the plurality of undirected edges.

According to the heterogeneous acceleration method based on end-to-end learning, labels are arranged on edges from each node to another node in the state machine topological structure, and if a plurality of edges in the state machine topological structure have the same labels and are connected to the same state, the edges in the state machine topological structure are used as one edge node of the state machine undirected graph structure.

According to the invention, the graph convolution neural network comprises a plurality of graph convolution layers, each node in the state machine undirected graph is input into the graph convolution neural network, and an embedded vector of each node is obtained, wherein the method comprises the following steps:

generating an adjacency matrix based on the state machine undirected graph, wherein the adjacency matrix comprises neighbor nodes of each node and characteristics thereof;

feature aggregation is carried out on each node in the state machine undirected graph and the neighbor nodes thereof through at least one graph volume layer;

and combining the aggregated neighbor features with the self features to obtain the embedded vector of each node.

The invention provides a heterogeneous acceleration method based on end-to-end learning, which inputs the embedded vector of each node into the differentiable clustering module to obtain a plurality of regular character clusters, and comprises the following steps:

and the differentiable clustering module distributes each node into the nearest cluster class according to the distance between the embedded vector of each node and the clustering center to obtain a plurality of regular character clusters.

The invention provides a heterogeneous acceleration method based on end-to-end learning, which is obtained by carrying out joint training on a graph convolution neural network and a differentiable clustering module, and comprises the following steps:

acquiring a training data set, wherein the training data set comprises a plurality of nodes and corresponding labels;

Inputting the training data set into the graph convolution neural network to acquire an embedded vector of each node;

Inputting the embedded vector of each node into the differentiable clustering module to obtain a plurality of regular character clusters;

Calculating loss values corresponding to the regular character clusters based on the comprehensive loss function;

The back propagation and the updating of model parameters by the optimizer are carried out according to the loss value until the training ending condition is met, and a well trained graph convolution neural network model of the integrated clustering module is obtained;

The comprehensive loss function comprises classification loss, clustering loss and regularization term of a Laplace matrix of the nodes;

the calculating the loss value of the embedded vector based on the comprehensive loss function includes:

Calculating the difference between the embedded vector corresponding to the marking node data and the real label by using a cross entropy loss function;

using regularization terms of the Laplace matrix to encourage adjacent nodes in unlabeled nodes to have similar characteristic representations by using structural information of the state machine undirected graph;

and calculating the average value of the distance from each node to the nearest clustering center by using the clustering loss.

The invention provides a heterogeneous acceleration method based on end-to-end learning, wherein the training method of a clustering center in a differentiable clustering module comprises the following steps:

Initializing a group of cluster centers, wherein the number and the dimension of the cluster centers are trainable parameters;

simultaneously calculating the distances between all data points and all clustering centers;

Taking the average value of the distance from each data point to the nearest clustering center as a clustering loss;

And minimizing the clustering loss in the training process of the graph convolution neural network so as to acquire the number and the dimension of the clustering centers.

According to the invention, the heterogeneous acceleration method based on end-to-end learning is provided, and the calculating of the distances between all data points and all clustering centers comprises the following steps:

Adding a data point dimension function in PyTorch, adding the shape feature dimension of the data point to obtain a three-dimensional data point shape, and adding the shape feature of the clustering center by adding a center point dimension function in PyTorch to obtain a three-dimensional center point shape;

expanding the three-dimensional data point shape into a data point shape tensor through a data point tensor expansion function, and expanding the three-dimensional central point shape into a central point shape tensor through a central point tensor expansion function;

calculating differences between all data points and all clustering centers in each characteristic dimension according to the data point shape tensors and the central point shape tensors;

And carrying out square operation on the difference, and summing square operation results along the last dimension of the characteristic tensor to obtain the square Euclidean distance between each data point and each clustering center.

The invention provides a heterogeneous acceleration method based on end-to-end learning, which further comprises the following steps:

Converting tensor-formatted data point shape tensors into array or list-formatted data point shape features;

And converting the tensor-formatted center point shape tensor into an array or list-formatted center point shape feature.

An Adam optimizer is used in the training process, which is used to adaptively adjust the learning rate during each training iteration.

The invention provides a heterogeneous acceleration method based on end-to-end learning, which is used for acquiring the correlation among all characters in a regular expression corresponding to a non-deterministic finite automaton and comprises the following steps:

calculating a first conditional probability of surrounding characters generated by characters in the regular expression, and acquiring word vectors of the regular expression corresponding to the non-deterministic finite automaton according to the first conditional probability;

correlation between the plurality of word vectors is calculated based on the cosine similarity.

Calculating a second conditional probability of a corresponding character generated by surrounding characters of a certain character in the regular expression, and acquiring a word vector of the regular expression corresponding to the non-deterministic finite automaton according to the second conditional probability;

The invention provides a heterogeneous acceleration method based on end-to-end learning, which is a judging method for consistency of correlation between each regular character cluster and each character in a regular expression, and comprises the following steps:

If the correlation of a plurality of characters in the obtained regular expression is greater than a preset threshold, and the characters belong to the same regular character cluster;

then it is determined that the regular character cluster is consistent with the relevance of each character in the regular expression.

Counting the occurrence frequency of different characters according to historical regular expression sample data;

Marking characters with the occurrence frequency higher than a preset high-frequency threshold value as high-frequency words, and taking the high-frequency words as positive sample labels;

and marking the characters with the occurrence frequency lower than a preset low-frequency threshold value as low-frequency words, and taking the low-frequency words as negative sample labels.

According to the heterogeneous acceleration method based on end-to-end learning, the high-frequency threshold value is the same as the low-frequency threshold value, or the low-frequency threshold value is smaller than the high-frequency threshold value.

The invention provides a heterogeneous acceleration method based on end-to-end learning, wherein the local hardware equipment comprises: a CPU or GPU; the heterogeneous device comprises an FPGA, further comprising:

The CPU or GPU sends control instructions to the FPGA through a register, wherein the control instructions comprise control start, reset and address offset.

The invention also provides a heterogeneous acceleration device based on end-to-end learning, comprising:

The generation module is used for acquiring a data control flow through the local hardware equipment and generating a non-deterministic finite automaton according to the generated regular expression, wherein the non-deterministic finite automaton is used for representing the regular expression so as to analyze and filter the data control flow;

The analysis module is used for receiving the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyzing the non-deterministic finite automaton based on a graph convolution neural network model of the integrated clustering module, and configuring the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relation with a regular expression represented by the non-deterministic finite automaton, and carrying out parallel analysis and filtering on the data control flow;

The invention also provides a terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the heterogeneous acceleration method based on end-to-end learning according to any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the heterogeneous acceleration method based on end-to-end learning of any of the above.

According to the heterogeneous acceleration method, the heterogeneous acceleration device, the heterogeneous acceleration equipment and the heterogeneous acceleration storage medium based on end-to-end learning, the data control flow is acquired through the local hardware equipment, and the non-deterministic finite automaton is generated according to the generated regular expression and is used for representing the regular expression so as to analyze and filter the data control flow; receiving the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyzing the non-deterministic finite automaton based on a graph convolution neural network model of an integrated clustering module, and configuring the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relation with a regular expression represented by the non-deterministic finite automaton, and carrying out parallel analysis and filtering on the data control flow; the graph rolling neural network model of the integrated clustering module is obtained based on the joint training of the differentiable clustering module and the graph rolling neural network, and whether the non-deterministic finite automaton shows a regular rule can be effectively verified by using the graph rolling neural network model of the integrated clustering module, so that more efficient regular expression matching is realized on the FPGA, more developers can easily apply the FPGA to develop acceleration application, and the development of the hardware acceleration technology field based on regular expression matching is promoted.

Drawings

In order to more clearly illustrate the invention or the technical solutions in the related art, the following description will briefly explain the drawings used in the embodiments or the related art description, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for those skilled in the art.

Fig. 1 is a schematic flow chart of a heterogeneous acceleration method based on end-to-end learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a device deployment provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a non-deterministic finite automaton topology provided by an embodiment of the present invention;

FIG. 4 is a state machine undirected graph of a non-deterministic finite automaton provided by an embodiment of the present invention;

Fig. 5 is a schematic functional structure diagram of a heterogeneous acceleration device based on end-to-end learning according to an embodiment of the present invention;

fig. 6 is a schematic functional structure of a terminal device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flowchart of an end-to-end learning-based heterogeneous acceleration method provided by an embodiment of the present invention, as shown in fig. 1, where the end-to-end learning-based heterogeneous acceleration method provided by the embodiment of the present invention includes:

step 101, acquiring a data control flow through a local hardware device, and generating a non-deterministic finite automaton according to a generated regular expression, wherein the non-deterministic finite automaton is used for representing the regular expression so as to analyze and filter the data control flow;

102, receiving the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyzing the non-deterministic finite automaton based on a graph convolution neural network model of an integrated clustering module, and configuring the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relation with a regular expression represented by the non-deterministic finite automaton, and carrying out parallel analysis and filtering on the data control flow;

In an embodiment of the present invention, a local hardware device includes: a CPU or GPU; the heterogeneous device comprises an FPGA, further comprising: the CPU or GPU sends control instructions into the FPGA through a register, wherein the control instructions comprise control start, reset and address offset.

In the embodiment of the invention, a CPU or a GPU is deployed on a storage server with the model number of NF5266M6, and heterogeneous acceleration is realized by matching with an acceleration card with the model number of F37X. The working flow is as follows: first, the CPU transfers the database data to the DDR of the FPGA board by direct memory access (Direct Memory Access, DMA). Meanwhile, the CPU generates NFA of the regular expression, forms the information of the NFA into frames containing a plurality of regular expressions, each frame can contain a plurality of regularities, and then transmits the frames to Double Data Rate (DDR) of the FPGA board. The CPU feeds necessary control information into the FPGA through registers including control start, reset, address offset, etc. Then, the frame data is parsed and different NFAs are configured onto different regularization engines according to the configuration information. Once the configuration of the regular engine is completed, the system starts to analyze and filter the data frames in parallel, and finally gathers the results, and hardware acceleration is realized by processing the data frames in parallel.

As shown in fig. 2, the key hardware components include a CPU, DDR memory of the FPGA board, the FPGA board itself, and a regularization engine. The data and control flow starts from the CPU and is transmitted to the DDR of the FPGA through the DMA, then the CPU sends control information to the FPGA through the register, and finally the regular engine inside the FPGA processes the data. Wherein the roles of each hardware component include:

The CPU is used as a central processing unit of the system and is responsible for generating the NFA of the regular expression and controlling the flow direction of data and control information. The CPU uses DMA to transfer data from the database to DDR memory of the FPGA board card. The DDR memory is used for storing data transmitted from a database and frames containing regular expressions NFA generated by a CPU. The FPGA board card is used for receiving control information from the CPU through the register, analyzing frame data, and configuring the NFA to the regular engine for data analysis and filtering. The regular engine is a component in the FPGA and is used for processing and analyzing the data frames in parallel and executing regular expression matching.

Since regular expressions tend to be long and numerous, and often require modification, even if the generation of state machines is automatically implemented using programs, while improving the efficiency of the present architectural solution, a large number of development case validations are still required during program development and deployment. However, considering the large and flexible amount and limited development of online analytical Processing (OLAP) business data traffic, traversing all possible regular expression queries is difficult to achieve, and lacks an efficient way to verify whether a non-deterministic finite automaton based On regular expression generation is reasonable.

According to the heterogeneous acceleration method based on end-to-end learning, which is provided by the embodiment of the invention, a data control flow is obtained through a local hardware device, and a non-deterministic finite automaton is generated according to a generated regular expression, and the non-deterministic finite automaton is used for characterizing the regular expression so as to analyze and filter the data control flow; receiving the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyzing the non-deterministic finite automaton based on a graph convolution neural network model of an integrated clustering module, and configuring the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relation with a regular expression represented by the non-deterministic finite automaton, and carrying out parallel analysis and filtering on the data control flow; the graph rolling neural network model of the integrated clustering module is obtained based on the joint training of the differentiable clustering module and the graph rolling neural network, and whether the non-deterministic finite automaton shows a regular rule can be effectively verified by using the graph rolling neural network model of the integrated clustering module, so that more efficient regular expression matching is realized on the FPGA, more developers can easily apply the FPGA to develop acceleration application, and the development of the hardware acceleration technology field based on regular expression matching is promoted.

Based on any of the above embodiments, the analyzing the non-deterministic finite automaton based on the graph rolling neural network model of the integrated clustering module includes:

step 201, converting the nondeterministic finite automaton into a state machine undirected graph;

Step 202, inputting each node in the state machine undirected graph into the graph convolution neural network, and obtaining an embedded vector of each node;

Step 203, inputting the embedded vector of each node into the differentiable clustering module to obtain a plurality of regular character clusters; the graph convolution neural network and the differentiable clustering module are obtained through combined training;

In the embodiment of the invention, when the clustering module is integrated into the graph convolution neural network, the network parameters and the clustering center can be simultaneously optimized. This means that the network not only learns the feature representation of the data, but also how to cluster the data efficiently according to these features. This way of joint training may make the feature representation more adaptive to the clustering task, thereby improving the clustering effect.

Step 204, obtaining the correlation among the characters in the regular expression corresponding to the non-deterministic finite automaton;

and 205, judging that the non-deterministic finite automata is in a matching relation with the regular expression represented by the non-deterministic finite automata when the correlation between each regular character cluster and each character in the regular expression is consistent.

In the embodiment of the invention, the method for converting the nondeterministic finite automaton into the state machine undirected graph comprises the following steps:

2011, acquiring a state machine topology structure of the non-deterministic finite automaton;

Expressed by the expression p (at) For example, (r|n) the state machine topology of the converted NFA is shown in fig. 3, where Si is the initial state, S0 to S4 are intermediate states, sf is the final accepted state, and epsilon represents an arbitrary character. The build NFA is automatically generated by a software program, the input of the build program is a regular expression, and the output is the NFA stack.

The state topology node comprises:

Edges with empty strings (ε) from Si to S0;

From S0 to S1, edges with labels 'p';

from S1 to S2, edges with labels 'a';

from S2 to S3, edges with labels't';

from S3 to Sf, edges with labels 'r';

from S0 to S3, edges with labels 'p' (one jump, meaning that it is possible to transfer directly from S0 to S3 without going through S1 and S2);

from S0 to S4, edges with labels 'p';

from S2 to S4, edges with labels 'n';

From S4 to Sf, edges with labels 'n';

From S0 to S1, S0 to S3, S0 to S4 are edges with labels 'p', indicating that multiple transition paths exist between these states.

In this NFA, epsilon-conversion (i.e., space-to-serial conversion) allows the automaton to transition from Si to S0 without input. The characters (p, a, t, r, n) on the other sides represent possible state transitions when a specific input is received. Further, the rings in the figure indicate that the automaton may stay in the same state when receiving a particular input in a given state.

If there are multiple edges in the original NFA that have the same label and are connected to the same state, we can treat them as the same node. For example, if there are two edges with both labels 'p' connected to state S1, then there is only one p node in the new graph.

In the NFA state machine undirected graph shown in fig. 4, each node represents a unique edge tag in the original NFA. The connections between nodes represent edges in the original NFA that have a common state. The undirected graph is specifically described as follows:

and (3) node: epsilon, p, a, t, r, n;

Edges: epsilon is connected to p, p is connected to a, p is connected to t, p is connected to r, t is connected to n, r is connected to n.

Step 2012, taking the edge from one node to another node in the state machine topology structure as an edge node in the state machine undirected graph;

Step 2013, in a state machine undirected graph, if two edge nodes are connected through a state in the state machine topology structure, creating an undirected edge between the two edge nodes;

step 2014, generating a state machine undirected graph structure based on the plurality of edge nodes and the plurality of undirected edges.

In some embodiments of the present invention, labels are provided on edges from each node to another node in the state machine topology, and if a plurality of edges in the state machine topology have the same label and are connected to the same state, the plurality of edges in the state machine topology are used as an edge node of the state machine undirected graph structure.

Based on any of the above embodiments, the graph convolution neural network includes a plurality of graph convolution layers, and the inputting the state machine undirected graph into the graph convolution neural network by each node, obtaining an embedded vector of each node includes:

step 2021, generating an adjacency matrix based on the state machine undirected graph, wherein the adjacency matrix comprises neighbor nodes of each node and characteristics thereof;

Step 2022, performing feature aggregation on each node in the state machine undirected graph and its neighboring nodes through at least one graph roll layer;

Step 2023, combining the aggregated neighbor features and the self features to obtain an embedded vector of each node.

In an embodiment of the present invention, a graph convolutional neural network is made up of a stack of a plurality of such convolutional neural layers, the inputs of which are node features and structures (adjacency matrices) of the state machine undirected graph, and the outputs of which are high-level feature representations of the nodes for various downstream tasks, such as node classification, graph classification, etc.

Based on any of the foregoing embodiments, the inputting the embedded vector of each node into the differentiable clustering module obtains a plurality of regular character clusters, including:

In the embodiment of the invention, the clustering algorithm can firstly organize the data under the unsupervised condition so as to find out the natural grouping (i.e. clusters) of words or characters. By performing cluster analysis on the whole regular character set, a hidden structure or pattern is found, and a global view of the vocabulary space structure is provided. The clustering algorithm is, for example, KMeans algorithm, which is a classical clustering analysis method and is widely used in the fields of data mining and machine learning. Its main purpose is to divide a set of data into clusters such that data points within the same cluster are as similar as possible, while data points of different clusters are as different as possible. KMeans description of the basic steps and principles of the algorithm:

initializing: the number K of clusters to be divided is first determined. Then randomly select K data points as the initial cluster center.

Assigning data points: for each data point, its distance from all K cluster centers is calculated and assigned to the cluster represented by the nearest cluster center.

Updating the cluster center: the center point of each cluster is recalculated, typically taking the average of all the data points within the cluster as the new cluster center.

Iteration: repeating the step 2 and the step 3 until a certain stopping condition is met, such as that the change of the cluster center is smaller than a certain threshold value, or a preset iteration number is reached.

And (3) outputting results: the final K clusters and the cluster center are the output results of the algorithm.

If the conventional KMeans algorithm is combined with the graph roll-up neural network, there is a substantial difference between the operation mechanism of the conventional KMeans algorithm and the training process of the neural network. First, the iterative clustering method converges to the final clustering result by alternately performing two steps: (1) assigning a nearest cluster center to each data point; (2) The center position of each cluster is updated to the mean of the points it contains. This process is based on deterministic rules, not involving gradient descent or any steps that can be optimized by back propagation. Thus, the conventional KMeans algorithm itself does not have differentiable properties and cannot be embedded directly into the differentiable framework of the neural network. Second, the training of neural networks relies on gradient descent algorithms, which require that the entire model (including all its operations and components) must be differentiable in order to calculate the gradient and optimize it by back propagation. The cluster center updating step of the conventional KMeans algorithm is a process based on hard allocation, i.e., the data points either belong to a cluster entirely or not. This hard allocation mechanism lacks a progressive nature and cannot generate a continuous gradient signal for use with gradient descent. In addition, the convergence properties of the conventional KMeans algorithm are also different from the training process of the neural network.

Conventional KMeans algorithms typically converge to a locally optimal solution, and the quality of this solution is highly dependent on the choice of the initial cluster center. In contrast, neural networks attempt to find global optima through gradient descent on training data, and their performance is less dependent on the exact value of the initial parameters. Therefore, the traditional KMeans algorithm and the graph rolling neural network can only be independently trained, the graph rolling neural network independently learns the characteristic representation of the node, and the process is completely based on the graph structure and the node characteristic without considering the clustering information of the node. When using KMeans for clustering, while the KMeans algorithm would cluster nodes based on these feature representations, it does not affect the learning process of the graph roll-up neural network feature representations. Thus, the graph convolution neural network cannot take into account the goals of node clustering when learning the feature representations, which may result in the learned feature representations not being optimal for the clustering task.

In view of the above problems, in the embodiments of the present invention, a novel clustering method is implemented by converting the conventional KMeans algorithm into a differentiable form and combining it with a neural network model. The method not only maintains the characteristics of intuitiveness and simplicity of KMeans clustering, but also meets the requirements of a deep learning framework, so that the method can play a role in more complex data analysis tasks. Unlike the conventional KMeans algorithm, the differentiable clustering module does not have a hard assignment (i.e., assigns data points directly to the nearest cluster center), but rather indirectly affects the update of the cluster center by calculating the loss. This design makes the whole clustering process differentiable and therefore can be optimized by gradient descent.

While the graph convolutional neural network learns the node feature representation in the process of the graph convolutional neural network and KMeans joint training, the KMeans module clusters the nodes based on the current feature representation, and the clustering result can feedback to influence the learning of the graph convolutional neural network. This feedback mechanism allows the graph convolution neural network to learn a node feature representation that is more suitable for the clustering task. The clustering target becomes part of the graph convolutional neural network feature learning process and helps guide the graph convolutional neural network to learn the features of more differentiated regular characters.

In the embodiment of the invention, the differentiable clustering module (KMeansModule) is a class inherited from nn. Module, which implements the key steps of the KMeans clustering algorithm and makes appropriate modifications to adapt to the training mechanism of the neural network. The core design concept is to define the cluster centers as trainable parameters of the module and to implement the clustering process by calculating the distance between the input data points and these cluster centers.

Based on any one of the above embodiments, the training method for the clustering center in the differentiable clustering module includes:

For example, in the constructor of KMeansModule, a set of cluster centers is first initialized. These centers are trainable parameters of the module, the number of which is specified by parameter num_ clusters, and the dimension of each center is determined by num_features. These cluster centers are updated continuously during the training process to better represent the clustering structure of the data.

In the forward propagation (forward) function of the module, logic is implemented that calculates the Euclidean distance between the input data point and the cluster center. To perform this calculation efficiently, a PyTorch broadcast mechanism (broadcasting) is utilized to expand the dimensions of the data points and cluster centers, thereby enabling the simultaneous calculation of the distances between all data points and all cluster centers.

KMeansModule is calculated as the average of the distances of each data point to its nearest cluster center, which is used as the loss of clusters. By minimizing this clustering loss during the training of the graph-convolution neural network, the network is able to learn a feature representation that effectively clusters data points.

In an embodiment of the present invention, calculating the distances between all data points and all cluster centers includes:

In calculating the Euclidean distance, a PyTorch broadcast mechanism is applied to simplify and optimize the calculation process. Euclidean distance is one method of measuring the distance between two points, and the calculation formula is the square root of the sum of squares of the difference values of the dimensions between the two points. In multidimensional space, if there are multiple points and multiple centers, the broadcast mechanism is particularly important when the distance from each point to each center is calculated. A set of data points is defined taking into account the scene in KMeans algorithmAnd a set of cluster centers/>Wherein/>The shape of (2) is/>，/>The shape of (2) is/>. A straightforward method of calculating the euclidean distance of each data point to each cluster center requires a nested loop, which is extremely inefficient on large data sets. At this time, a broadcast mechanism may be applied to perform the calculation, and specific steps include:

(1) Expanding the dimension of the data points and cluster centers:

the shape of data point X is extended from (num_nodes, num_features) to (num_nodes, 1, num_features) by x.unsqueeze (1).

The shape of cluster center C is extended from (num_ clusters, num_features) to (1, num_ clusters, num_features) by self_cluster_centers. Unsqueeze (0).

(2) Calculating the difference using a broadcast mechanism:

PyTorch automatically applies the broadcast mechanism when performing the x_expand-centers_expand operation. Although x_expansion 'and' centers_expansion differ in size in the second dimension (corresponding to 1 and num_ clusters), pyTorch automatically expands the shape of the two tensors so that they can match in the second dimension. The result is a new tensor in the shape (num_nodes, num_ clusters, num_features) representing the difference between all data points and all cluster centers in each feature dimension.

(3) Calculating a sum of squares:

The result of the previous step is squared and then summed along the last dimension (i.e., num_features dimension), resulting in a squared euclidean distance between each data point and each cluster center. The shape is a tensor of (num_nodes, num_ clusters), where each element represents the square of the distance between a data point and a cluster center.

In this way, the broadcasting mechanism enables us to avoid explicit cyclic calculation of the distance between each pair of data points and the cluster center, greatly simplifying the code and improving the calculation efficiency. The underlying optimization with PyTorch enables matrix operations to run efficiently on hardware, especially on GPUs.

In the embodiment of the invention, the heterogeneous acceleration method based on end-to-end learning further comprises the following steps:

Because the python database supports array or list formats, converting features of tensor format to array or list formats is more suitable for downstream data presentation.

Based on any of the above embodiments, the joint training of the graph convolution neural network and the differentiable clustering module includes:

step 2031, obtaining a training data set, wherein the training data set comprises a plurality of nodes and corresponding labels;

in an embodiment of the present invention, the tag obtaining method includes:

In an embodiment of the present invention, the high frequency threshold is the same as the low frequency threshold, or the low frequency threshold is smaller than the high frequency threshold.

Step 2032, inputting the training data set into the graph convolution neural network to obtain an embedded vector of each node;

step 2033, inputting the embedded vector of each node into the differentiable clustering module to obtain a plurality of regular character clusters;

Step 2034, calculating loss values corresponding to the regular character clusters based on the comprehensive loss function;

Step 2035, performing back propagation according to the loss value and updating model parameters by an optimizer until a training end condition is met, so as to obtain a trained graph convolution neural network model of the integrated cluster module;

in an embodiment of the present invention, the calculating the loss value of the embedded vector based on the comprehensive loss function includes:

Typically an average of the distances of each data point to its nearest cluster center is used. This loss reflects the compactness of the clusters, with smaller values indicating closer data points to their cluster centers and better clustering results. A composite model incorporating a graph-convolution neural network (GCN) and KMeans clusters is used to optimize the model by a joint training method. First, the model learns the feature representation of the nodes through the GCN while clustering the features using the KMeans module. An Adam optimizer is used in the training process, is an effective self-adaptive learning rate optimization method, and is suitable for processing large-scale graph structure data.

In the present example, the loss function consists of three parts: classification loss, cluster loss, and regularization term of the graph-based laplacian matrix for the nodes. The combination of the three components considers the accuracy of node classification and the closeness of clustering, and meanwhile, the integrity of the graph structure is maintained through the Laplace regularization term. In the training cycle, the model computes a feature representation of the node at each epoch by forward propagation, then computes the total loss based on these representations, and updates the model parameters by back propagation and optimizers. Furthermore, we also evaluated the performance of the model on the test set after the end of training. The joint training method enables the model to effectively learn node characteristic representation and perform node clustering at the same time, and provides a powerful and flexible tool for graph data analysis. Not only are high quality embedded representations of each NFA corresponding undirected graph node in the graph obtained, but the nodes are effectively assigned to different cluster classes based on these representations. This approach allows us to take advantage of the deep level node features learned by the GCN, and the clustering capabilities of KMeans, to achieve finer and meaningful cluster analysis on graph data.

In an embodiment of the invention, an Adam optimizer is used in the training process, and the Adam optimizer is used for adaptively adjusting the learning rate in each training iteration process.

Based on any one of the above embodiments, the obtaining the correlation between the characters in the regular expression corresponding to the non-deterministic finite automaton includes, but is not limited to, the following schemes:

Scheme one: calculating a first conditional probability of surrounding characters generated by characters in the regular expression, and acquiring word vectors of the regular expression corresponding to the non-deterministic finite automaton according to the first conditional probability;

For example, generating a satisfactory string based on a regular expression, such as "patr", centered on a portion of any character, the conditional probability of generating a surrounding character for scheme one isThe probability distribution meets the conditional independence, and the probability density can be rewritten as/>In this model, each character has two vector representations for computing the corresponding probabilities. Specifically, the present model uses vectors s and t to represent two vectors of one character as a center character and a peripheral character, thereby estimating the corresponding probability distribution/>Wherein regular character index setsCorresponding to characters and patterns of interest to the canonical matching process. The index corresponds to a probability density of:

Wherein, Representing a given character/>In the case of (1) character/>Probability of occurrence,/>Representing characters/>Vector representation when used as surrounding characters,/>Representing characters/>Vector representation when used as a center character,/>Representation/>Is a transposed vector of (a).

Based on the likelihood function estimated model parameters, using random gradients to optimize the logarithm of the maximum likelihood probability density, and carrying out gradient calculation on the vectors:

By optimizing to The gradient of the loss function is calculated for the probabilities of all the upstream and downstream characters of the center character.

After training, the vector s of the output character of the model is used as correlation analysis, when the correlation between the characters is larger than a certain threshold (such as the correlation is larger than 0.95), whether the two characters are connected to the same state in the state machine is judged, and the generation of the strongly correlated state machine is accurate, otherwise, the state machine reasoning is problematic.

Scheme II: calculating a second conditional probability of a corresponding character generated by surrounding characters of a certain character in the regular expression, and acquiring a word vector of the regular expression corresponding to the non-deterministic finite automaton according to the second conditional probability;

In the embodiment of the present invention, central character representation is predicted around the upstream and downstream characters, such as "patr", and the partial arbitrary characters are taken as the center, and based on that the conditional probabilities of "P", "a", "r" are P ('t' | 'P', 'a', 'r'), the conditional probabilities of the given upstream and downstream regular characters and the related central characters can be written as:

Similar to scheme one, vectors s and t represent two vectors with one character as a center character and a peripheral character in a regular expression. Is a subset of characters of size K for characters/>The character belongs to the center character/>Contextual character of (a), character index set/>Characters or patterns corresponding to the canonical matching process relationship. Based on this, the maximum likelihood probability of model two is/>。

Model parameters are estimated based on the likelihood functions. After training, the vector t of the model two output character is used as correlation analysis. When the correlation between characters is greater than a certain threshold (e.g., the correlation is greater than 0.95), a determination is made as to whether the two characters are connected to the same state in the state machine. For example, the characters't' and 'n' are connected through the state 'S4', and the generation of a strongly-correlated state machine is proved to be accurate, otherwise, the state machine reasoning is problematic.

In the embodiment of the invention, the character/character string with the strongest correlation in the non-deterministic finite automaton is consistent with the character/character string with the strongest correlation in the regular expression, and the two words with the correlation larger than 0.95 can be judged to be identical.

According to the heterogeneous acceleration method based on end-to-end learning, through the introduction of the graph convolution neural network model of the integrated clustering module, a regular acceleration scheme based on the FPGA is rapidly developed and tested, so that the computing performance is improved, the performance bottleneck is solved, the processing efficiency of flexible service cases is improved, and the computing performance is improved. By using the state machine graph model, the regular rule can be more effectively represented, so that more efficient regular expression matching is realized on the FPGA. The method solves the problem of calculation performance possibly faced by the prior art when a large number of regular expressions are processed, and simultaneously is expected to provide a more powerful tool for the fields of network security and the like, so that the real-time detection effect of threats is improved. And secondly, by adopting an end-to-end learning method, the system can more intelligently judge whether the state machine accurately represents the regular rule, so that the development efficiency of a regular acceleration scheme is improved. Not only reduces the burden of the developer in the FPGA programming aspect, but also is hopeful to promote the wider application of the regular expression matching technology in various fields. By providing two correlation-based schemes, efficient means of verification and development are provided while maintaining flexibility. The method has important significance for the fields of OLAP business data business and the like, wherein a large number of flexible cases exist.

The heterogeneous acceleration device based on end-to-end learning provided by the invention is described below, and the heterogeneous acceleration device based on end-to-end learning described below and the heterogeneous acceleration method based on end-to-end learning described above can be correspondingly referred to each other.

Fig. 5 is a functional structural schematic diagram of a heterogeneous acceleration device based on end-to-end learning according to an embodiment of the present invention, where, as shown in fig. 5, the heterogeneous acceleration device based on end-to-end learning according to an embodiment of the present invention includes:

A generating module 501, configured to obtain a data control flow through a local hardware device, and generate a non-deterministic finite automaton according to a generated regular expression, where the non-deterministic finite automaton is used to characterize the regular expression, so as to analyze and filter the data control flow;

the analysis module 502 is configured to receive the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyze the non-deterministic finite automaton based on a graph convolution neural network model of the integrated clustering module, and configure the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relationship with a regular expression represented by the non-deterministic finite automaton, and perform parallel analysis and filtering on the data control flow;

According to the heterogeneous acceleration device based on end-to-end learning, which is provided by the embodiment of the invention, a data control flow is obtained through a local hardware device, and a non-deterministic finite automaton is generated according to a generated regular expression, and the non-deterministic finite automaton is used for characterizing the regular expression so as to analyze and filter the data control flow; receiving the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyzing the non-deterministic finite automaton based on a graph convolution neural network model of an integrated clustering module, and configuring the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relation with a regular expression represented by the non-deterministic finite automaton, and carrying out parallel analysis and filtering on the data control flow; the graph rolling neural network model of the integrated clustering module is obtained based on the joint training of the differentiable clustering module and the graph rolling neural network, and whether the non-deterministic finite automaton shows a regular rule can be effectively verified by using the graph rolling neural network model of the integrated clustering module, so that more efficient regular expression matching is realized on the FPGA, more developers can easily apply the FPGA to develop acceleration application, and the development of hardware acceleration technology field based on regular expression matching is promoted.

In an embodiment of the present invention, the analysis module 502 is configured to:

In an embodiment of the present invention, the converting the non-deterministic finite automaton into a state machine undirected graph includes:

In an embodiment of the present invention, the graph convolution neural network includes a plurality of graph convolution layers, and the inputting each node in the state machine undirected graph into the graph convolution neural network, to obtain an embedded vector of each node includes:

In the embodiment of the present invention, the inputting the embedded vector of each node into the differentiable clustering module to obtain a plurality of regular character clusters includes:

In the embodiment of the invention, the graph convolution neural network and the differentiable clustering module are obtained by combined training, and the method comprises the following steps:

In the embodiment of the invention, the training method of the clustering center in the differential clustering module comprises the following steps:

In an embodiment of the present invention, the method further includes:

In the embodiment of the present invention, the obtaining the correlation between each character in the regular expression corresponding to the non-deterministic finite automaton includes:

In the embodiment of the invention, the judging method for consistency of the relevance between each regular character cluster and each character in the regular expression comprises the following steps:

In an embodiment of the present invention, the tag obtaining method includes:

According to the heterogeneous acceleration device based on end-to-end learning, through the graph convolution neural network model introduced with the integrated clustering module, a regular acceleration scheme based on the FPGA is rapidly developed and tested, so that the computing performance is improved, the performance bottleneck is solved, the processing efficiency of flexible service cases is improved, and the computing performance is improved. By using the state machine graph model, the regular rule can be more effectively represented, so that more efficient regular expression matching is realized on the FPGA. The method solves the problem of calculation performance possibly faced by the prior art when a large number of regular expressions are processed, and simultaneously is expected to provide a more powerful tool for the fields of network security and the like, so that the real-time detection effect of threats is improved. And secondly, by adopting an end-to-end learning method, the system can more intelligently judge whether the state machine accurately represents the regular rule, so that the development efficiency of a regular acceleration scheme is improved. Not only reduces the burden of the developer in the FPGA programming aspect, but also is hopeful to promote the wider application of the regular expression matching technology in various fields. By providing two correlation-based schemes, efficient means of verification and development are provided while maintaining flexibility. The method has important significance for the fields of OLAP business data business and the like, wherein a large number of flexible cases exist.

Fig. 6 illustrates an entity structure diagram of a terminal device, and as shown in fig. 6, the server may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, memory 630 communicate with each other via communication bus 640. The memory 630 includes a computer program, an operating system, and captured graph structure data, and the processor 610 may invoke logic instructions in the memory 630 to perform a heterogeneous acceleration method based on end-to-end learning, the method comprising: acquiring a data control flow through a local hardware device, and generating a non-deterministic finite automaton according to a generated regular expression, wherein the non-deterministic finite automaton is used for representing the regular expression so as to analyze and filter the data control flow; receiving the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyzing the non-deterministic finite automaton based on a graph convolution neural network model of an integrated clustering module, and configuring the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relation with a regular expression represented by the non-deterministic finite automaton, and carrying out parallel analysis and filtering on the data control flow; the graph convolution neural network model of the integrated clustering module is obtained based on the combined training of the differentiable clustering module and the graph convolution neural network.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the related art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the heterogeneous acceleration method based on end-to-end learning provided by the above methods, the method comprising: acquiring a data control flow through a local hardware device, and generating a non-deterministic finite automaton according to a generated regular expression, wherein the non-deterministic finite automaton is used for representing the regular expression so as to analyze and filter the data control flow; receiving the data control flow and the non-deterministic finite automaton through heterogeneous equipment, analyzing the non-deterministic finite automaton based on a graph convolution neural network model of an integrated clustering module, and configuring the non-deterministic finite automaton to a corresponding regular engine when the non-deterministic finite automaton is in a matching relation with a regular expression represented by the non-deterministic finite automaton, and carrying out parallel analysis and filtering on the data control flow; the graph convolution neural network model of the integrated clustering module is obtained based on the combined training of the differentiable clustering module and the graph convolution neural network.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The heterogeneous acceleration method based on end-to-end learning is characterized by comprising the following steps of:

2. The heterogeneous acceleration method based on end-to-end learning according to claim 1, wherein the graph roll-up neural network model based on the integrated clustering module analyzes the non-deterministic finite automaton, comprising:

3. The heterogeneous acceleration method based on end-to-end learning according to claim 2, wherein the converting the non-deterministic finite automaton into a state machine undirected graph comprises:

4. A heterogeneous acceleration method based on end-to-end learning according to claim 3, wherein labels are set on edges from each node to another node in the state machine topology, and if there are multiple edges in the state machine topology with the same labels and connected to the same state, the multiple edges in the state machine topology are used as one edge node of the state machine undirected graph structure.

5. The heterogeneous acceleration method of claim 2, wherein the graph convolution neural network includes a plurality of graph convolution layers, the inputting the graph convolution neural network to each node in the state machine undirected graph, obtaining an embedded vector of each node, includes:

6. The heterogeneous acceleration method based on end-to-end learning according to claim 2, wherein the inputting the embedded vector of each node into the differentiable clustering module obtains a plurality of regular character clusters, and the method comprises:

7. The heterogeneous acceleration method based on end-to-end learning according to claim 2, wherein the graph roll-up neural network and the differentiable clustering module are obtained by joint training, and the heterogeneous acceleration method comprises the following steps:

8. The heterogeneous acceleration method based on end-to-end learning according to claim 7, wherein the training method of the clustering center in the differentiable clustering module comprises:

Calculating the distance between all data points and all clustering centers;

9. The heterogeneous acceleration method of claim 8, wherein the calculating distances between all data points and all cluster centers comprises:

10. The heterogeneous acceleration method based on end-to-end learning according to claim 9, further comprising:

11. The heterogeneous acceleration method based on end-to-end learning of claim 8, further comprising:

12. The heterogeneous acceleration method based on end-to-end learning according to claim 2, wherein the obtaining the correlation between the characters in the regular expression corresponding to the non-deterministic finite automaton comprises:

13. The heterogeneous acceleration method based on end-to-end learning according to claim 2, wherein the obtaining the correlation between the characters in the regular expression corresponding to the non-deterministic finite automaton comprises:

14. The heterogeneous acceleration method based on end-to-end learning according to claim 2, wherein the method for determining consistency of relevance between each regular character cluster and each character in the regular expression comprises:

15. The heterogeneous acceleration method based on end-to-end learning according to claim 7, wherein the tag acquisition method comprises:

16. The heterogeneous acceleration method of end-to-end learning based according to claim 15, wherein the high frequency threshold is the same as the low frequency threshold or the low frequency threshold is smaller than the high frequency threshold.

17. The heterogeneous acceleration method based on end-to-end learning according to claim 1, wherein the local hardware device comprises: a CPU or GPU; the heterogeneous device comprises an FPGA, further comprising:

18. Heterogeneous accelerating device based on end-to-end learning, characterized by comprising:

19. A terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the heterogeneous acceleration method based on end-to-end learning of any one of claims 1 to 17 when the program is executed by the processor.

20. A non-transitory readable storage medium having stored thereon a computer program which, when executed by a processor, implements the heterogeneous acceleration method based on end-to-end learning of any one of claims 1 to 17.