CN114997400A - Neural network acceleration reasoning method - Google Patents

Neural network acceleration reasoning method Download PDF

Info

Publication number
CN114997400A
CN114997400A CN202210598354.5A CN202210598354A CN114997400A CN 114997400 A CN114997400 A CN 114997400A CN 202210598354 A CN202210598354 A CN 202210598354A CN 114997400 A CN114997400 A CN 114997400A
Authority
CN
China
Prior art keywords
terminal
cloud server
representing
neural network
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210598354.5A
Other languages
Chinese (zh)
Inventor
郝占龙
陈凯
周异
黄征
陈昊
方恒凯
吴胜杰
庄国金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Shangji Network Technology Co ltd
Nanjing Shangji Enterprise Service Co ltd
Original Assignee
Xiamen Shangji Network Technology Co ltd
Nanjing Shangji Enterprise Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Shangji Network Technology Co ltd, Nanjing Shangji Enterprise Service Co ltd filed Critical Xiamen Shangji Network Technology Co ltd
Priority to CN202210598354.5A priority Critical patent/CN114997400A/en
Publication of CN114997400A publication Critical patent/CN114997400A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention relates to a neural network acceleration reasoning method, which comprises the following steps: dividing the working model into a plurality of fragments; constructing an objective function with the minimum sum of the processing time of each fragment and the processing energy consumption of each fragment as a target; solving an optimal solution of the objective function; caching the plurality of fragments to a plurality of computing nodes respectively according to the optimal solution; the computing node comprises a terminal, an edge server and a cloud server; recording a caching scheme of the working model; and performing collaborative reasoning among the computing nodes according to the caching scheme. According to the invention, the working model is fragmented and cached to the plurality of computing nodes, and collaborative reasoning is carried out among the computing nodes, so that the computing capability and the position advantage of the edge server can be fully utilized, the computing pressure of the cloud server and the access delay and the transmission delay in the reasoning process are reduced, and the real-time performance is strong.

Description

Neural network acceleration reasoning method
Technical Field
The invention relates to a neural network acceleration reasoning method, and belongs to the field of edge calculation.
Background
A neural network is a computing system formed by a number of simple processing units interconnected to one another in some way, which processes information by means of a dynamic response of its state to externally input information. The neural network has a self-learning function and can deduce unknown relation between unknown data, so that the neural network can predict the unknown data and popularize the unknown data to multiple fields. However, the inference process of the neural network requires a large amount of computing resources, especially for a working model including a plurality of neural networks, such as a knowledge graph, an expert knowledge base, and the like.
Because the user terminal is difficult to provide the required computing resources for the neural network, the traditional method is to place a working model in a cloud server and utilize the strong computing power of the cloud server to carry out neural network reasoning operation so as to respond to the user requirements. However, the cloud server is far away from the terminal, the access is centralized, the access delay and the transmission delay are both large, and the user with high real-time requirement is difficult to respond in time.
Therefore, a neural network inference method with higher real-time performance is needed.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention designs a neural network accelerated reasoning method, a caching method is solved by setting an objective function, a working model is cached to a plurality of computing nodes in a segmented mode, and collaborative reasoning is carried out among the computing nodes, so that the computing capacity and the position advantage of an edge server can be fully utilized, the computing pressure of a cloud server is reduced, meanwhile, the time required by the collaborative reasoning is far shorter than that of a traditional cloud computing scheme, the real-time performance is strong, and the user experience is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a neural network accelerated reasoning method comprises the following steps:
dividing the working model into a plurality of fragments;
constructing an objective function with the minimum sum of the processing time of each fragment and the processing energy consumption of each fragment as a target;
solving an optimal solution of the objective function;
caching the plurality of fragments to a plurality of computing nodes respectively according to the optimal solution; the computing node comprises a terminal, an edge server and a cloud server;
recording a caching scheme of the working model; and performing collaborative reasoning among a plurality of computing nodes according to the caching scheme.
Further, the objective function is formulated as:
Figure BDA0003668974510000021
Figure BDA0003668974510000022
Figure BDA0003668974510000023
in the formula, n represents the number of slices; t is i Representing the processing time of the ith slice; e i Representing the processing energy consumption of the ith fragment; a and B represent weights; a is i 、b ij 、c i Is a variable; t is t ai 、t bji 、t ci Respectively representing the processing time of the ith fragment on the terminal, the jth edge server and the cloud server; e.g. of the type ai 、e bij 、e ci Respectively representing the processing energy consumption of the ith fragment on the terminal, the jth edge server and the cloud server; m represents the number of edge servers.
The constraint of the objective function is formulated as:
Figure BDA0003668974510000024
a i ∈{0、1}、b ij ∈{0、1}、c i ∈{0、1}
Figure BDA0003668974510000025
in the formula (I), the compound is shown in the specification,
Figure BDA0003668974510000026
respectively representing the waiting time delay of the ith fragment on the terminal, the jth edge server and the cloud server; t is k Indicating the processing time of the kth slice; pred (i) denotes the associated set of pre-shards with the ith shard.
Further, the processing time includes the waiting time delay of the slice, the calculation time delay and the transmission time delay of the slice input data.
Further, the transmission delay is expressed by the formula:
Figure BDA0003668974510000031
in the formula (I), the compound is shown in the specification,
Figure BDA0003668974510000032
respectively representing the transmission time delay of the ith fragment on the terminal, the jth edge server and the cloud server; (ii) a S i Represents the ith scoreThe size of the slice input data volume; r is a radical of hydrogen a Representing the uplink data transmission rate of the terminal; r is bj Representing the upstream data transmission rate of the jth edge server; r is c And the data transmission rate of the cloud server is represented.
Further, still include: the terminal, each edge server and the cloud server are provided with request queues; and calculating the average waiting time delay of the request queue as the waiting time delay of the fragments in the terminal, each edge server and the cloud server.
Compared with the prior art, the invention has the following characteristics and beneficial effects:
according to the invention, by setting the target function to solve the cache method, caching the working model to a plurality of computing nodes in a partitioned manner, and performing collaborative reasoning among the computing nodes, the computing capability and the position advantage of the edge server (the edge server is closer to the terminal) can be fully utilized, the computing pressure of the cloud server is reduced, meanwhile, the time required by the collaborative reasoning is far shorter than that of the traditional cloud computing scheme, the real-time performance is strong, and the user experience is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in more detail with reference to examples.
Example one
As shown in fig. 1, a neural network accelerated reasoning method includes the following steps:
training a working model, wherein the working model comprises at least one neural network model, such as an intelligent financial processing system in the patent 'an intelligent financial processing system and method', and comprises a plurality of neural network models such as a text extraction model, an image classification model and a document classification model.
The working model is partitioned into n slices.
Construct the objective function as follows
Figure BDA0003668974510000041
In the formula, n represents the number of slices; t is a unit of i Representing the processing time of the ith slice; e i Representing the processing energy consumption of the ith fragment; a and B represent weights, a + B ═ 1; a is a i 、b ij 、c i Is variable and has the value range of 0 and 1,
Figure BDA0003668974510000042
t ai 、t bji 、t ci respectively representing the processing time of the ith fragment on the terminal, the jth edge server and the cloud server; e.g. of the type ai 、e bij 、e ci Respectively representing the processing energy consumption of the ith fragment on the terminal, the jth edge server and the cloud server; m represents the number of edge servers.
Figure BDA0003668974510000043
Figure BDA0003668974510000044
Figure BDA0003668974510000045
In the formula (I), the compound is shown in the specification,
Figure BDA0003668974510000046
the calculation delay, the waiting delay and the transmission delay of the ith fragment on the terminal are shown (the transmission delay in the embodiment refers to the transmission time of the input data required by the ith fragment);
Figure BDA0003668974510000047
the calculation time delay, the waiting time delay and the transmission time delay of the ith fragment on the jth edge server are represented;
Figure BDA0003668974510000048
representing the computation delay, waiting delay and transmission of the ith fragment on the cloud serverAnd (4) time delay.
The terminal, each edge server and the cloud server are provided with request queues; calculating the average waiting time delay of the request queue as the waiting time delay of the fragments in the terminal, each edge server and the cloud server
Figure BDA0003668974510000049
Figure BDA0003668974510000051
In the formula (d) i Representing the number of CPU clock cycles needed by the ith fragment; f. of a Representing the working frequency of a terminal CPU; f. of bj Indicating the operating frequency of the jth edge server CPU (in this embodiment, it is assumed that the operating frequencies of the edge servers are the same); f. of c Representing the working frequency of a CPU of the cloud server;
Figure BDA0003668974510000052
in the formula, S i Representing the size of the input data amount required by the ith slice; r is a Representing the uplink data transmission rate of the terminal; r is bj Representing the upstream data transmission rate of the jth edge server; r is c And the data transmission rate of the cloud server is represented.
e ai =d i C a
e bij =d i C bj
e ci =d i C c
In the formula (d) i Representing the number of CPU clock cycles needed by the ith fragment; c a Representing the energy consumed by the terminal in each clock cycle; f. of bj Representing the energy consumed by the jth edge server in each clock cycle; f. of c Representing the energy consumed by the cloud server per clock cycle.
s.t.
Figure BDA0003668974510000053
a i ∈{0、1}、b ij ∈{0、1}、c i ∈{0、1}
Figure BDA0003668974510000054
Where pred (i) denotes the associated set of pre-slices. The constraint indicates that the waiting time of the ith slice is greater than or equal to all the associated slice completion times before the ith slice.
Solving the optimal solution of the objective function; caching the n fragments to a terminal, an edge server and a cloud server according to the optimal solution;
the terminal records the caching scheme; and when the terminal receives the inference request, performing collaborative inference among the computing nodes according to the caching scheme.
Example two
Since the objective function is an NP problem, the objective function is solved using a genetic algorithm:
s1, randomly generating M solutions as individuals to form an initial population;
s2, calculating the objective function value of each individual as the fitness according to the objective function formula;
s3, reserving N individuals with the highest fitness as candidate solutions;
s4, generating a new candidate solution according to the candidate solution;
s5, carrying out mutation operation on the newly generated candidate solution;
s6, iteratively circulating steps S2-S5 until the fitness of the optimal individual reaches a threshold value, wherein the optimal individual is the optimal solution of the objective function.
It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Claims (6)

1. A neural network acceleration reasoning method is characterized by comprising the following steps:
dividing the working model into a plurality of fragments;
constructing an objective function with the minimum sum of the processing time of each fragment and the processing energy consumption of each fragment as a target;
solving an optimal solution of the objective function;
caching the plurality of fragments to a plurality of computing nodes respectively according to the optimal solution; the computing node comprises a terminal, an edge server and a cloud server;
recording a caching scheme of the working model; and performing collaborative reasoning among a plurality of computing nodes according to the caching scheme.
2. The neural network accelerated reasoning method of claim 1, wherein the objective function is formulated as:
Figure FDA0003668974500000011
Figure FDA0003668974500000012
Figure FDA0003668974500000013
in the formula, n represents the number of slices; t is i Representing the processing time of the ith slice; e i Representing the processing energy consumption of the ith fragment; a and B both represent weights; a is i 、b ij 、c i Is a variable; t is t ai 、t bji 、t ci Respectively representing the processing time of the ith fragment on the terminal, the jth edge server and the cloud server; e.g. of the type ai 、e bij 、e ci Respectively representing the processing energy consumption of the ith fragment on the terminal, the jth edge server and the cloud server; m represents the number of edge servers.
3. The neural network accelerated reasoning method of claim 2, wherein the constraint condition of the objective function is formulated as:
Figure FDA0003668974500000014
a i ∈{0、1}、b ij ∈{0、1}、c i ∈{0、1}
Figure FDA0003668974500000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003668974500000022
respectively representing the waiting time delay of the ith fragment on the terminal, the jth edge server and the cloud server; t is k Indicating the processing time of the kth slice; pred (i) denotes the associated set of pre-shards with the ith shard.
4. The neural network accelerated reasoning method of claim 2, wherein the processing time comprises a slice latency, a computation delay, and a transmission delay of slice input data.
5. The neural network accelerated inference method of claim 4, wherein the transmission delay is expressed by a formula:
Figure FDA0003668974500000023
in the formula (I), the compound is shown in the specification,
Figure FDA0003668974500000024
respectively representing the transmission time delay of the ith fragment on the terminal, the jth edge server and the cloud server; (ii) a S i Representing the size of the ith sliced input data volume; r is a Representing the uplink data transmission rate of the terminal; r is bj Representing the uplink data transmission rate of the jth edge server; r is c And the data transmission rate of the cloud server is represented.
6. The neural network accelerated reasoning method of claim 4, further comprising: the terminal, each edge server and the cloud server are provided with request queues; and calculating the average waiting time delay of the request queue as the waiting time delay of the fragments in the terminal, each edge server and the cloud server.
CN202210598354.5A 2022-05-30 2022-05-30 Neural network acceleration reasoning method Withdrawn CN114997400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210598354.5A CN114997400A (en) 2022-05-30 2022-05-30 Neural network acceleration reasoning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210598354.5A CN114997400A (en) 2022-05-30 2022-05-30 Neural network acceleration reasoning method

Publications (1)

Publication Number Publication Date
CN114997400A true CN114997400A (en) 2022-09-02

Family

ID=83029859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210598354.5A Withdrawn CN114997400A (en) 2022-05-30 2022-05-30 Neural network acceleration reasoning method

Country Status (1)

Country Link
CN (1) CN114997400A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858131A (en) * 2023-02-22 2023-03-28 山东海量信息技术研究院 Task execution method, system, device and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858131A (en) * 2023-02-22 2023-03-28 山东海量信息技术研究院 Task execution method, system, device and readable storage medium

Similar Documents

Publication Publication Date Title
CN113242568B (en) Task unloading and resource allocation method in uncertain network environment
CN108920280B (en) Mobile edge computing task unloading method under single-user scene
CN112380008B (en) Multi-user fine-grained task unloading scheduling method for mobile edge computing application
CN109885397B (en) Delay optimization load task migration algorithm in edge computing environment
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
CN112598150B (en) Method for improving fire detection effect based on federal learning in intelligent power plant
US11784931B2 (en) Network burst load evacuation method for edge servers
CN110717300A (en) Edge calculation task allocation method for real-time online monitoring service of power internet of things
CN111813506A (en) Resource sensing calculation migration method, device and medium based on particle swarm algorithm
Vakilian et al. Using the cuckoo algorithm to optimizing the response time and energy consumption cost of fog nodes by considering collaboration in the fog layer
CN110162390B (en) Task allocation method and system for fog computing system
Liu et al. Fedpa: An adaptively partial model aggregation strategy in federated learning
CN115374853A (en) Asynchronous federal learning method and system based on T-Step polymerization algorithm
CN113590279A (en) Task scheduling and resource allocation method for multi-core edge computing server
CN114997400A (en) Neural network acceleration reasoning method
CN116366576A (en) Method, device, equipment and medium for scheduling computing power network resources
CN113139639B (en) MOMBI-oriented smart city application multi-target computing migration method and device
CN117478697A (en) Industrial Internet data sharing optimization method based on intelligent slicing decision block chain
Yuan et al. A DRL-Based Container Placement Scheme with Auxiliary Tasks.
EP4202676A1 (en) Method and apparatus for multi-task scheduling, device and storage medium
Ren et al. Balanced allocation method of physical education distance education resources based on linear prediction
Sun et al. Aledar: An attentions-based encoder-decoder and autoregressive model for workload forecasting of cloud data center
CN112764932A (en) Deep reinforcement learning-based calculation-intensive workload high-energy-efficiency distribution method
CN113157344A (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
Xu et al. Machine Learning for Interconnect Network Traffic Forecasting: Investigation and Exploitation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220902