CN114997400A - Neural network acceleration reasoning method - Google Patents
Neural network acceleration reasoning method Download PDFInfo
- Publication number
- CN114997400A CN114997400A CN202210598354.5A CN202210598354A CN114997400A CN 114997400 A CN114997400 A CN 114997400A CN 202210598354 A CN202210598354 A CN 202210598354A CN 114997400 A CN114997400 A CN 114997400A
- Authority
- CN
- China
- Prior art keywords
- terminal
- cloud server
- representing
- neural network
- ith
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention relates to a neural network acceleration reasoning method, which comprises the following steps: dividing the working model into a plurality of fragments; constructing an objective function with the minimum sum of the processing time of each fragment and the processing energy consumption of each fragment as a target; solving an optimal solution of the objective function; caching the plurality of fragments to a plurality of computing nodes respectively according to the optimal solution; the computing node comprises a terminal, an edge server and a cloud server; recording a caching scheme of the working model; and performing collaborative reasoning among the computing nodes according to the caching scheme. According to the invention, the working model is fragmented and cached to the plurality of computing nodes, and collaborative reasoning is carried out among the computing nodes, so that the computing capability and the position advantage of the edge server can be fully utilized, the computing pressure of the cloud server and the access delay and the transmission delay in the reasoning process are reduced, and the real-time performance is strong.
Description
Technical Field
The invention relates to a neural network acceleration reasoning method, and belongs to the field of edge calculation.
Background
A neural network is a computing system formed by a number of simple processing units interconnected to one another in some way, which processes information by means of a dynamic response of its state to externally input information. The neural network has a self-learning function and can deduce unknown relation between unknown data, so that the neural network can predict the unknown data and popularize the unknown data to multiple fields. However, the inference process of the neural network requires a large amount of computing resources, especially for a working model including a plurality of neural networks, such as a knowledge graph, an expert knowledge base, and the like.
Because the user terminal is difficult to provide the required computing resources for the neural network, the traditional method is to place a working model in a cloud server and utilize the strong computing power of the cloud server to carry out neural network reasoning operation so as to respond to the user requirements. However, the cloud server is far away from the terminal, the access is centralized, the access delay and the transmission delay are both large, and the user with high real-time requirement is difficult to respond in time.
Therefore, a neural network inference method with higher real-time performance is needed.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention designs a neural network accelerated reasoning method, a caching method is solved by setting an objective function, a working model is cached to a plurality of computing nodes in a segmented mode, and collaborative reasoning is carried out among the computing nodes, so that the computing capacity and the position advantage of an edge server can be fully utilized, the computing pressure of a cloud server is reduced, meanwhile, the time required by the collaborative reasoning is far shorter than that of a traditional cloud computing scheme, the real-time performance is strong, and the user experience is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a neural network accelerated reasoning method comprises the following steps:
dividing the working model into a plurality of fragments;
constructing an objective function with the minimum sum of the processing time of each fragment and the processing energy consumption of each fragment as a target;
solving an optimal solution of the objective function;
caching the plurality of fragments to a plurality of computing nodes respectively according to the optimal solution; the computing node comprises a terminal, an edge server and a cloud server;
recording a caching scheme of the working model; and performing collaborative reasoning among a plurality of computing nodes according to the caching scheme.
Further, the objective function is formulated as:
in the formula, n represents the number of slices; t is i Representing the processing time of the ith slice; e i Representing the processing energy consumption of the ith fragment; a and B represent weights; a is i 、b ij 、c i Is a variable; t is t ai 、t bji 、t ci Respectively representing the processing time of the ith fragment on the terminal, the jth edge server and the cloud server; e.g. of the type ai 、e bij 、e ci Respectively representing the processing energy consumption of the ith fragment on the terminal, the jth edge server and the cloud server; m represents the number of edge servers.
The constraint of the objective function is formulated as:
a i ∈{0、1}、b ij ∈{0、1}、c i ∈{0、1}
in the formula (I), the compound is shown in the specification,respectively representing the waiting time delay of the ith fragment on the terminal, the jth edge server and the cloud server; t is k Indicating the processing time of the kth slice; pred (i) denotes the associated set of pre-shards with the ith shard.
Further, the processing time includes the waiting time delay of the slice, the calculation time delay and the transmission time delay of the slice input data.
Further, the transmission delay is expressed by the formula:
in the formula (I), the compound is shown in the specification,respectively representing the transmission time delay of the ith fragment on the terminal, the jth edge server and the cloud server; (ii) a S i Represents the ith scoreThe size of the slice input data volume; r is a radical of hydrogen a Representing the uplink data transmission rate of the terminal; r is bj Representing the upstream data transmission rate of the jth edge server; r is c And the data transmission rate of the cloud server is represented.
Further, still include: the terminal, each edge server and the cloud server are provided with request queues; and calculating the average waiting time delay of the request queue as the waiting time delay of the fragments in the terminal, each edge server and the cloud server.
Compared with the prior art, the invention has the following characteristics and beneficial effects:
according to the invention, by setting the target function to solve the cache method, caching the working model to a plurality of computing nodes in a partitioned manner, and performing collaborative reasoning among the computing nodes, the computing capability and the position advantage of the edge server (the edge server is closer to the terminal) can be fully utilized, the computing pressure of the cloud server is reduced, meanwhile, the time required by the collaborative reasoning is far shorter than that of the traditional cloud computing scheme, the real-time performance is strong, and the user experience is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in more detail with reference to examples.
Example one
As shown in fig. 1, a neural network accelerated reasoning method includes the following steps:
training a working model, wherein the working model comprises at least one neural network model, such as an intelligent financial processing system in the patent 'an intelligent financial processing system and method', and comprises a plurality of neural network models such as a text extraction model, an image classification model and a document classification model.
The working model is partitioned into n slices.
Construct the objective function as follows
In the formula, n represents the number of slices; t is a unit of i Representing the processing time of the ith slice; e i Representing the processing energy consumption of the ith fragment; a and B represent weights, a + B ═ 1; a is a i 、b ij 、c i Is variable and has the value range of 0 and 1,t ai 、t bji 、t ci respectively representing the processing time of the ith fragment on the terminal, the jth edge server and the cloud server; e.g. of the type ai 、e bij 、e ci Respectively representing the processing energy consumption of the ith fragment on the terminal, the jth edge server and the cloud server; m represents the number of edge servers.
In the formula (I), the compound is shown in the specification,the calculation delay, the waiting delay and the transmission delay of the ith fragment on the terminal are shown (the transmission delay in the embodiment refers to the transmission time of the input data required by the ith fragment);the calculation time delay, the waiting time delay and the transmission time delay of the ith fragment on the jth edge server are represented;representing the computation delay, waiting delay and transmission of the ith fragment on the cloud serverAnd (4) time delay.
The terminal, each edge server and the cloud server are provided with request queues; calculating the average waiting time delay of the request queue as the waiting time delay of the fragments in the terminal, each edge server and the cloud server
In the formula (d) i Representing the number of CPU clock cycles needed by the ith fragment; f. of a Representing the working frequency of a terminal CPU; f. of bj Indicating the operating frequency of the jth edge server CPU (in this embodiment, it is assumed that the operating frequencies of the edge servers are the same); f. of c Representing the working frequency of a CPU of the cloud server;
in the formula, S i Representing the size of the input data amount required by the ith slice; r is a Representing the uplink data transmission rate of the terminal; r is bj Representing the upstream data transmission rate of the jth edge server; r is c And the data transmission rate of the cloud server is represented.
e ai =d i C a
e bij =d i C bj
e ci =d i C c
In the formula (d) i Representing the number of CPU clock cycles needed by the ith fragment; c a Representing the energy consumed by the terminal in each clock cycle; f. of bj Representing the energy consumed by the jth edge server in each clock cycle; f. of c Representing the energy consumed by the cloud server per clock cycle.
s.t.
a i ∈{0、1}、b ij ∈{0、1}、c i ∈{0、1}
Where pred (i) denotes the associated set of pre-slices. The constraint indicates that the waiting time of the ith slice is greater than or equal to all the associated slice completion times before the ith slice.
Solving the optimal solution of the objective function; caching the n fragments to a terminal, an edge server and a cloud server according to the optimal solution;
the terminal records the caching scheme; and when the terminal receives the inference request, performing collaborative inference among the computing nodes according to the caching scheme.
Example two
Since the objective function is an NP problem, the objective function is solved using a genetic algorithm:
s1, randomly generating M solutions as individuals to form an initial population;
s2, calculating the objective function value of each individual as the fitness according to the objective function formula;
s3, reserving N individuals with the highest fitness as candidate solutions;
s4, generating a new candidate solution according to the candidate solution;
s5, carrying out mutation operation on the newly generated candidate solution;
s6, iteratively circulating steps S2-S5 until the fitness of the optimal individual reaches a threshold value, wherein the optimal individual is the optimal solution of the objective function.
It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Claims (6)
1. A neural network acceleration reasoning method is characterized by comprising the following steps:
dividing the working model into a plurality of fragments;
constructing an objective function with the minimum sum of the processing time of each fragment and the processing energy consumption of each fragment as a target;
solving an optimal solution of the objective function;
caching the plurality of fragments to a plurality of computing nodes respectively according to the optimal solution; the computing node comprises a terminal, an edge server and a cloud server;
recording a caching scheme of the working model; and performing collaborative reasoning among a plurality of computing nodes according to the caching scheme.
2. The neural network accelerated reasoning method of claim 1, wherein the objective function is formulated as:
in the formula, n represents the number of slices; t is i Representing the processing time of the ith slice; e i Representing the processing energy consumption of the ith fragment; a and B both represent weights; a is i 、b ij 、c i Is a variable; t is t ai 、t bji 、t ci Respectively representing the processing time of the ith fragment on the terminal, the jth edge server and the cloud server; e.g. of the type ai 、e bij 、e ci Respectively representing the processing energy consumption of the ith fragment on the terminal, the jth edge server and the cloud server; m represents the number of edge servers.
3. The neural network accelerated reasoning method of claim 2, wherein the constraint condition of the objective function is formulated as:
a i ∈{0、1}、b ij ∈{0、1}、c i ∈{0、1}
in the formula (I), the compound is shown in the specification,respectively representing the waiting time delay of the ith fragment on the terminal, the jth edge server and the cloud server; t is k Indicating the processing time of the kth slice; pred (i) denotes the associated set of pre-shards with the ith shard.
4. The neural network accelerated reasoning method of claim 2, wherein the processing time comprises a slice latency, a computation delay, and a transmission delay of slice input data.
5. The neural network accelerated inference method of claim 4, wherein the transmission delay is expressed by a formula:
in the formula (I), the compound is shown in the specification,respectively representing the transmission time delay of the ith fragment on the terminal, the jth edge server and the cloud server; (ii) a S i Representing the size of the ith sliced input data volume; r is a Representing the uplink data transmission rate of the terminal; r is bj Representing the uplink data transmission rate of the jth edge server; r is c And the data transmission rate of the cloud server is represented.
6. The neural network accelerated reasoning method of claim 4, further comprising: the terminal, each edge server and the cloud server are provided with request queues; and calculating the average waiting time delay of the request queue as the waiting time delay of the fragments in the terminal, each edge server and the cloud server.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210598354.5A CN114997400A (en) | 2022-05-30 | 2022-05-30 | Neural network acceleration reasoning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210598354.5A CN114997400A (en) | 2022-05-30 | 2022-05-30 | Neural network acceleration reasoning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114997400A true CN114997400A (en) | 2022-09-02 |
Family
ID=83029859
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210598354.5A Withdrawn CN114997400A (en) | 2022-05-30 | 2022-05-30 | Neural network acceleration reasoning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114997400A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115858131A (en) * | 2023-02-22 | 2023-03-28 | 山东海量信息技术研究院 | Task execution method, system, device and readable storage medium |
-
2022
- 2022-05-30 CN CN202210598354.5A patent/CN114997400A/en not_active Withdrawn
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115858131A (en) * | 2023-02-22 | 2023-03-28 | 山东海量信息技术研究院 | Task execution method, system, device and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113242568B (en) | Task unloading and resource allocation method in uncertain network environment | |
CN108920280B (en) | Mobile edge computing task unloading method under single-user scene | |
CN112380008B (en) | Multi-user fine-grained task unloading scheduling method for mobile edge computing application | |
CN109885397B (en) | Delay optimization load task migration algorithm in edge computing environment | |
CN110968426B (en) | Edge cloud collaborative k-means clustering model optimization method based on online learning | |
CN112598150B (en) | Method for improving fire detection effect based on federal learning in intelligent power plant | |
US11784931B2 (en) | Network burst load evacuation method for edge servers | |
CN110717300A (en) | Edge calculation task allocation method for real-time online monitoring service of power internet of things | |
CN111813506A (en) | Resource sensing calculation migration method, device and medium based on particle swarm algorithm | |
Vakilian et al. | Using the cuckoo algorithm to optimizing the response time and energy consumption cost of fog nodes by considering collaboration in the fog layer | |
CN110162390B (en) | Task allocation method and system for fog computing system | |
Liu et al. | Fedpa: An adaptively partial model aggregation strategy in federated learning | |
CN115374853A (en) | Asynchronous federal learning method and system based on T-Step polymerization algorithm | |
CN113590279A (en) | Task scheduling and resource allocation method for multi-core edge computing server | |
CN114997400A (en) | Neural network acceleration reasoning method | |
CN116366576A (en) | Method, device, equipment and medium for scheduling computing power network resources | |
CN113139639B (en) | MOMBI-oriented smart city application multi-target computing migration method and device | |
CN117478697A (en) | Industrial Internet data sharing optimization method based on intelligent slicing decision block chain | |
Yuan et al. | A DRL-Based Container Placement Scheme with Auxiliary Tasks. | |
EP4202676A1 (en) | Method and apparatus for multi-task scheduling, device and storage medium | |
Ren et al. | Balanced allocation method of physical education distance education resources based on linear prediction | |
Sun et al. | Aledar: An attentions-based encoder-decoder and autoregressive model for workload forecasting of cloud data center | |
CN112764932A (en) | Deep reinforcement learning-based calculation-intensive workload high-energy-efficiency distribution method | |
CN113157344A (en) | DRL-based energy consumption perception task unloading method in mobile edge computing environment | |
Xu et al. | Machine Learning for Interconnect Network Traffic Forecasting: Investigation and Exploitation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220902 |