CN115706703A - Edge AI acceleration processing method and device, electronic equipment and readable storage medium - Google Patents

Edge AI acceleration processing method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN115706703A
CN115706703A CN202110930879.XA CN202110930879A CN115706703A CN 115706703 A CN115706703 A CN 115706703A CN 202110930879 A CN202110930879 A CN 202110930879A CN 115706703 A CN115706703 A CN 115706703A
Authority
CN
China
Prior art keywords
edge
artificial intelligence
network
data
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110930879.XA
Other languages
Chinese (zh)
Inventor
丁良奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Xiongan ICT Co Ltd
China Mobile System Integration Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Xiongan ICT Co Ltd
China Mobile System Integration Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Xiongan ICT Co Ltd, China Mobile System Integration Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110930879.XA priority Critical patent/CN115706703A/en
Publication of CN115706703A publication Critical patent/CN115706703A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an edge AI acceleration processing method and device, an electronic device and a readable storage medium, wherein the method comprises the following steps: constructing an edge computing network of a fusion block chain; based on the structure of the edge computing network, segmenting the edge computing network to obtain segmentation points; uploading the feature data to an edge platform at the segmentation point; and processing the characteristic data based on a preset artificial intelligence model conversion rule and a preset artificial intelligence model conversion tool of the field programmable gate array, and feeding back a calculation result to the mobile equipment. Feature data are uploaded to an edge platform at a segmentation point and then processed, the data are transmitted to an edge platform block chain through edge equipment, and the data are analyzed and fed back to and reasoned in user equipment; the decentralized edge AI is accelerated, so that the network transmission cost and time delay are reduced; the edge AI of the fusion block chain technology is accelerated, so that the data transmission is safer, and the user privacy is better protected.

Description

Edge AI acceleration processing method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to an edge AI acceleration processing method and apparatus, an electronic device, and a readable storage medium.
Background
At present, artificial intelligence application is used, and two main methods are used: firstly, completing the forward reasoning process of all AI algorithms at the mobile terminal; and secondly, uploading the data to the cloud end, carrying out forward reasoning by the cloud or edge cloud server, and feeding back the calculation result to the mobile end.
The first method is mostly used for applications running on mobile devices, such as face recognition, automatic driving, etc. The method uses a forward neural network acceleration frame to transplant a network model trained by a PC end into the mobile equipment, and completes all inference processes in mobile end hardware equipment CPU, GPU and DSP; or a neural network back-end acceleration library is used for calling hardware equipment to accelerate AI operation.
The hardware performance on mobile devices is far less than that of servers, and therefore many lightweight networks have been created specifically for mobile devices. However, these networks have limited accuracy and are not applicable in many fields.
The second method is commonly used in the industry to process image, video stream, speech and text data using deep learning techniques. The method utilizes powerful GPU cluster resources on the cloud server to complete a network reasoning process, or utilizes an edge cloud server close to a mobile terminal to reduce network delay and accelerate reasoning. And after the reasoning process is finished, feeding the result back to the mobile equipment.
A third approach, in addition to the two approaches mentioned above, is proposed in the paper "Neurosurgeon: colorful Intelligent understanding Between the Cloud and Mobile Edge": and intelligently segmenting the forward neural network by taking the network layer as granularity according to the condition of the algorithm model, the current network condition, the hardware condition of the mobile equipment and the like, wherein one part of the forward neural network is operated on the mobile equipment, and the other part of the forward neural network is operated on the cloud server. This reduces the system latency and reduces the power consumption of the mobile device for calculations. According to the real-time situation, the algorithm model is intelligently segmented, and the reasoning process is completed by the cooperation of the mobile terminal and the edge computing platform, so that the method is an effective method, but the following defects can be caused: 1) Security issues in the process of computing sharing; 2) The privacy of network model, parameters and user data; 3) A trust problem for multi-node data sharing.
Disclosure of Invention
The invention provides an edge AI acceleration processing method and device, an electronic device and a readable storage medium, which are used for solving the technical defects in the prior art.
The invention provides an edge AI acceleration processing method, which comprises the following steps:
constructing an edge computing network of the fusion block chain;
based on the structure of the edge computing network, segmenting the edge computing network to obtain segmentation points;
uploading the feature data to an edge platform at the segmentation point;
and processing the characteristic data based on a preset artificial intelligence model conversion rule and a preset artificial intelligence model conversion tool of the field programmable gate array, and feeding back a calculation result to the mobile equipment.
The edge AI acceleration processing method according to the invention, wherein the processing of the feature data based on the preset artificial intelligence model conversion rules and tools of the FPGA comprises:
splitting the channel dimension characteristics according to the vectors of the first vector and the second vector by adopting a brick technology, inputting the split channel dimension characteristics into a plurality of non-blocking parallel processing calculation processing units, taking a corresponding brick according to the number of tiles in each clock cycle, and recombining the brick in an on-chip cache through a blocking channel to form a characteristic diagram.
The edge AI accelerated processing method according to the present invention, wherein the processing of the feature data based on the artificial intelligence model conversion rule and the tool of the preset field programmable gate array further comprises:
the artificial intelligence model is layered by convolution, and when one artificial intelligence model has N layers of convolution operation, the description of each operand in the h header file is an array of discrete [ num ], num = N;
after each layer of characteristics is loaded into a cache, the characteristics are split in a brick form and are all input into the computing unit after a clock period; when the characteristic value is input into the computing unit, preloading the next layer of parameters into a cache;
and storing the operator model parameters of each layer trained by the artificial intelligence model in a binary file form according to the channel dimension in sequence.
The edge AI accelerated processing method according to the present invention, wherein after segmenting the edge computing network to obtain segmentation points, includes:
and verifying the chain nodes of the block chain.
The edge AI accelerated processing method according to the present invention, wherein before uploading the feature data to the edge platform at the segmentation point, includes:
and judging the network condition, responding to the network condition that the rate of success is greater than a preset threshold value, and uploading the characteristic data to the edge platform at the segmentation point.
The edge AI accelerated processing method according to the present invention, wherein before uploading the feature data to the edge platform at the segmentation point, includes:
and adjusting the segmentation points according to the server load, and uploading the characteristic data to the edge platform at the adjusted segmentation points.
The edge AI acceleration processing method according to the present invention, wherein before feeding back the calculation result to the mobile device, includes:
and performing credit promotion uplink on the edge computing network based on the computing result.
The present invention also provides an edge AI acceleration processing apparatus, including:
the network construction module is used for constructing an edge computing network of the fusion block chain;
the network segmentation module is used for segmenting the edge computing network based on the structure of the edge computing network to obtain segmentation points;
the data transmission module is used for uploading the characteristic data to the edge platform at the segmentation point;
and the data processing module is used for processing the characteristic data based on a preset artificial intelligence model conversion rule and a preset artificial intelligence model conversion tool of the field programmable gate array and feeding back a calculation result to the mobile equipment.
The present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the edge AI acceleration processing method as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the edge AI acceleration processing method as described in any of the above.
The method comprises the steps of segmenting the edge computing network to obtain segmentation points, uploading feature data to an edge platform at the segmentation points, processing the feature data based on preset artificial intelligence model conversion rules and tools of the field programmable gate array, transmitting the data to an edge platform block chain through edge equipment, and feeding back and reasoning the data to user equipment after analysis. The decentralized edge AI is accelerated, so that the network transmission cost and time delay are reduced; the edge AI of the fusion block chain technology is accelerated, so that the data transmission is safer, and the user privacy is better protected.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of an edge AI acceleration processing method according to the present invention;
FIG. 2 is a schematic structural diagram of an edge AI accelerated processing device according to the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
An edge AI (Artificial Intelligence) acceleration processing method according to the present invention is described below with reference to fig. 1, and includes:
s1, constructing an edge computing network of a fusion block chain;
the constructed edge computing network is based on software definition and a virtual distributed block chain, edge artificial intelligence computing does not perform artificial intelligence application processing based on a data center on a cloud any more, but keeps data local to equipment, and meanwhile transfers partial work of artificial intelligence to the equipment, and at the moment, the edge artificial intelligence computing faces a large amount of heterogeneous computing resources and a large amount of nodes with possibility of being hijacked. Therefore, the traditional framework suitable for the centralized artificial intelligence of the cloud is not suitable for direct application, and a novel edge framework using a block chain as a support is provided.
To ensure user data privacy, in blockchain based mobile edge computing networks, mobile devices are linked to the mobile edge network through 5G base stations and communicate with the nearest edge server, which allows them to communicate through a shared key. Communication authentication is based on identity management and access control, and generally a central authority is set up in a domain to manage identity or authenticate identity in a distributed manner, and if identity is authenticated as reliable, the allowed behavior is trustworthy.
The edge servers are primarily responsible for local network control, providing outsourced data storage and computation for local-based blockchain devices in a secure manner, some of which will be incorporated into higher-level server blockchains. The Docker container is deployed on edge nodes (typically edge servers) that execute intelligent contracts to ensure that transactions are securely and properly validated. The edge nodes use different types of blockchains according to different trust, privacy and fault tolerance requirements. All docker containers are uniformly managed by the kubenetes cluster and are responsible for management and downloading of the application programs.
The edge node can establish a cooperative relationship among multiple parties under the condition of guaranteeing privacy by utilizing various encryption algorithms. The addition of a new block in the blockchain requires a consensus mechanism to be established, any block records and stores a data link with the previous block, and a new block is attached to the ledger only if the corresponding message passes the authentication of most participants. The special mechanism design ensures good robustness under single-point failure and avoids malicious tampering of data.
S2, based on the structure of the edge computing network, segmenting the edge computing network to obtain segmentation points;
a point suitable for segmentation can be found from the edge computing network to split the edge computing network, with the first half running on the mobile device.
S3, uploading the feature data to an edge platform at the segmentation point;
and uploading the characteristic data to the edge platform at the segmentation point through a network, calculating the latter half part by the platform, and feeding back the result to the mobile terminal.
And S4, processing the characteristic data based on preset AI (artificial intelligence) model conversion rules and tools of the field programmable gate array, and feeding back the calculation result to the mobile equipment.
The aim of AI acceleration based on the edge computing platform is to provide real-time computing, and the block chain technique is introduced mainly to ensure the security of data and algorithms. But this introduction increases the time consumption to some extent. In order to guarantee the acceleration capability of the edge AI, an FPGA (Field-Programmable Gate Array) is introduced to accelerate the block chain and the AI model.
Different from an encryption and decryption algorithm of a block chain, in artificial intelligence, an AI edge computing network is complex and changeable, different edge computing networks are usually used in different application scenes, and the same model can also be subjected to certain optimization adjustment along with time migration. And the FPGA is a semi-custom circuit and is a programmable logic gate array. Meaning that replacing the edge computing network would bring a great deal of work to the algorithm lightweight value shifting personnel. In order to solve the problem, a set of AI model conversion rules and tools based on FPGA are designed. Different deep learning frameworks have different edge computing network description files, such as prototxt used by caffe, and when the edge computing network description files are converted into a model suitable for an FPGA, the edge computing network description files need to become c + + based h header files at host end, which can be understood by the FPGA. The algorithm conversion rule mainly refers to a rule for generating a control flow file h; and the edge computing network and the hyper-parameters are arranged on the edge block chain for storage and protection.
The method comprises the steps of segmenting the edge computing network to obtain segmentation points, uploading feature data to an edge platform at the segmentation points, processing the feature data based on preset artificial intelligence model conversion rules and tools of the field programmable gate array, transmitting the data to an edge platform block chain through edge equipment, and feeding back and reasoning the data to user equipment after analysis. The decentralized edge AI is accelerated, so that the network transmission cost and time delay are reduced; the edge AI of the fusion block chain technology is accelerated, so that the data transmission is safer, and the user privacy is better protected.
The edge AI acceleration processing method according to the invention, wherein the processing of the feature data based on the preset artificial intelligence model conversion rules and tools of the FPGA comprises:
splitting the channel dimension characteristics according to the vectors of the first vector and the second vector by adopting a brick technology, inputting the split channel dimension characteristics into a plurality of non-blocking parallel processing calculation processing units, taking a corresponding brick according to the number of tiles in each clock cycle, and recombining the brick in an on-chip cache through a blocking channel to form a characteristic diagram.
On-chip storage resources of the FPGA are limited, and it is difficult to load and calculate all the pictures and features at one time like a GPU (Graphics Processing Unit). Therefore, the tile technology is adopted to split the features of the channel chw (the channel dimensions are C, h and W respectively) according to the vector of Cwec (C vector, which indicates that a plurality of pixels are taken in the C direction) and Wwec (W vector, which indicates that a plurality of pixels are taken in the W direction), and the split features are input into a plurality of non-blocking parallel processing PEs (computing processing units), each cycle clock cycle takes a corresponding tile according to the number of PE, and finally the tile is recombined into a new feature map in an on-chip cache through a blocking channel.
Taking an AlexNet network structure as an example, a neural network correspondingly calculates a feature map through a plurality of operators, input and output parameters of each layer are different, required cycles are different, and data flow directions are different, so that a set of rules needs to be designed to realize flexible algorithm transplantation.
The edge AI accelerated processing method according to the present invention, wherein the processing of the feature data based on the artificial intelligence model conversion rule and the tool of the preset field programmable gate array further comprises:
layering the artificial intelligence models by convolution, wherein when one artificial intelligence model has N layers of convolution operation, each operand in the h-header file is described as an array of discrete [ num ], num = N;
after each layer of characteristics is loaded into a cache, the characteristics are split in a brick form and are all input into the computing unit after a clock period; when the characteristic value is input into the computing unit, preloading the next layer of parameters into a cache;
and storing the model parameters of each operator in each layer trained by the artificial intelligence model in a binary file form according to the channel dimension in sequence.
The edge computation network can be layered by convolution, and if an edge computation network has N layers of convolution (full concatenation can be regarded as special convolution) operation, the description of each operand in h is an array of discrete num, num = N. Like alexnet, there are 5 layers of convolution, N =5. Each num in the array is a specific description of the current layer parameters.
As described for each layer normalization function in alexnet: CONSTANT boom _ enabled [ NUM _ CONVOLUTIONS ] = { true, true, false, false, false }; it can be seen that after the convolution of the first two layers, the calculation needs to be carried out by the normalization function module, and the last three layers do not need to be carried out.
As described by Alexnet for each layer of filters (model convolution parameters): CONSTANT int filter _ full _ size [ NUM _ CONVOLUTIONS ] = {3 x 3,5 x 5,3 x 3}; each num value describes the corresponding filter size for the current layer.
The whole FPGA-based AI acceleration software design adopts a pipeline-filter architecture.
As mentioned above, after each layer of features is loaded into the cache, the features are split in tile form, and after a cycle of clock cycles,
Cycle=(C x W x H)/(Cwec x Wwec x Kwec),
all input to the calculation unit. In order to fully utilize hardware resources, a pipelining mode is used, and when the characteristic value is input into the computing unit, the next layer of parameters are preloaded into the cache. The maximum possible cycle number required is:
Cycle(next)=(K x P x Q)/(Cwec x Wwec x Kwec),
the clock period required to complete an operator is therefore
Cycle=if(Cycle>Cycle(next))?Cycle:Cycle(next);
And the host end is responsible for controlling each operator in the FPGA to be called through a Cycle wheel.
And storing the operator model parameters between each layer in a 2-system bin file mode according to the dimensionality of the nchw in sequence.
The edge AI accelerated processing method according to the present invention, wherein after segmenting the edge computing network to obtain segmentation points, includes:
and verifying the chain nodes of the block chain. The user invoking edge acceleration may first pass the on-chain authentication, involving credits, and therefore also requiring authentication of the on-chain nodes of the blockchain. The edge consensus mechanism increases the incentives for user data sharing.
The edge AI accelerated processing method according to the present invention, wherein before uploading the feature data to the edge platform at the segmentation point, includes:
and judging the network condition, responding to the network condition that the rate of success is greater than a preset threshold value, and uploading the characteristic data to the edge platform at the segmentation point. The network condition goodness refers to the communication degree, that is, if the network condition is good, the feature data is uploaded to the edge platform at the segmentation point; and if the network condition goodness is not greater than the preset threshold, directly skipping to the step of increasing the uplink credit without uploading the characteristic data to the edge platform at the segmentation point.
The edge AI accelerated processing method according to the present invention, wherein before uploading the feature data to the edge platform at the segmentation point, the method includes:
and adjusting the segmentation points according to the server load, and uploading the feature data to the edge platform at the adjusted segmentation points. Load balancing is considered in terms of server load adjustment split points.
The edge AI acceleration processing method according to the present invention, wherein before feeding back the calculation result to the mobile device, includes:
and performing credit promotion uplink on the edge computing network based on the computing result.
And improving the credibility of the positive and useful results, wherein the information with high credibility participates in multi-user decision making, the data is transmitted to the edge platform block chain through the edge 5G equipment, and the data is analyzed and fed back to the user equipment for reasoning. The credibility problem of multi-node information decision is solved.
Referring to fig. 2, the following describes an edge AI acceleration processing apparatus according to the present invention, and the edge AI acceleration processing apparatus described below and the edge AI acceleration processing method described above may be referred to in correspondence, where the edge AI acceleration processing apparatus includes:
a network construction module 10, configured to construct an edge computing network of a fusion block chain;
the constructed edge computing network is based on software definition and a virtual distributed block chain, edge artificial intelligence computing does not process artificial intelligence application based on a data center on a cloud any more, but keeps data local to equipment, and meanwhile, partial work of artificial intelligence is transferred to the equipment, and at the moment, the edge artificial intelligence computing faces a large number of heterogeneous computing resources and a large number of nodes with hijacking possibility. Therefore, the traditional framework suitable for the centralized artificial intelligence of the cloud is not suitable for direct application, and a novel edge framework using a block chain as a support is provided.
To ensure user data privacy, in a blockchain-based mobile edge computing network, mobile devices are linked to the mobile edge network through 5G base stations and communicate with the nearest edge server, which allows them to communicate through a shared key. Communication authentication is based on identity management and access control, and generally a central authority is set up in a domain to manage identity or authenticate identity in a distributed manner, and if identity is authenticated as reliable, the allowed behavior is trustworthy.
The edge servers are primarily responsible for local network control, providing outsourced data storage and computation for local-based blockchain devices in a secure manner, some of which will be incorporated into higher-level server blockchains. The Docker containers are deployed on edge nodes (typically edge servers) that execute intelligent contracts to ensure that transactions are securely and properly authenticated. The edge nodes use different types of blockchains according to different trust, privacy and fault tolerance requirements. All docker containers are uniformly managed by the kubenetes cluster and are responsible for management and downloading of the application programs.
The edge node can establish a cooperative relationship among multiple parties under the condition of guaranteeing privacy by utilizing various encryption algorithms. The addition of a new block in the blockchain requires a consensus mechanism to be established, any block records and stores a data link with the previous block, and a new block is attached to the ledger only if the corresponding message passes the authentication of most participants. The special mechanism design ensures good robustness under single-point failure and avoids malicious tampering of data.
A network segmentation module 20, configured to segment the edge computing network based on the structure of the edge computing network to obtain segmentation points;
a point suitable for segmentation can be found from the edge computing network, the edge computing network is split, and the first half part runs on the mobile device.
The data transmission module 30 is used for uploading the characteristic data to the edge platform at the segmentation point;
and uploading the characteristic data to the edge platform at the segmentation point through a network, calculating the latter half part by the platform, and feeding back the result to the mobile terminal.
And the data processing module 40 is used for processing the characteristic data based on preset artificial intelligence model conversion rules and tools of the field programmable gate array and feeding back a calculation result to the mobile equipment.
The aim of AI acceleration based on the edge computing platform is to provide real-time computing, and the block chain technique is introduced mainly to ensure the security of data and algorithms. But this introduction increases the time consumption to some extent. In order to guarantee the acceleration capability of the edge AI, an FPGA (Field-Programmable Gate Array) is introduced to accelerate the block chain and the AI model.
The edge AI accelerated processing apparatus according to the present invention, wherein the data processing module 40 is specifically configured to:
splitting the channel dimension characteristics according to the vectors of the first vector and the second vector by adopting a brick technology, inputting the split channel dimension characteristics into a plurality of non-blocking parallel processing calculation processing units, taking a corresponding brick according to the number of tiles in each clock cycle, and recombining the brick in an on-chip cache through a blocking channel to form a characteristic diagram.
On-chip storage resources of the FPGA are limited, and it is difficult to load and calculate all the pictures and features at one time, such as a GPU (Graphics Processing Unit). Therefore, the tile technology is adopted to split the features of the channel chw (the channel dimensions are C, h and W respectively) according to the vectors of Cwec (C vector, which indicates that a plurality of pixels are taken in the C direction) and Wwec (W vector, which indicates that a plurality of pixels are taken in the W direction), and the split features are input into a plurality of non-blocking parallel processing PEs (computing processing units), each cycle clock cycle takes a corresponding tile according to the number of PE, and finally, the tile is recombined into a new feature map in an on-chip cache through a blocking channel.
Taking an AlexNet network structure as an example, a neural network correspondingly calculates a feature map through a plurality of operators, input and output parameters of each layer are different, required cycles are different, and data flow directions are different, so that a set of rules needs to be designed to realize flexible algorithm transplantation.
The edge AI accelerated processing apparatus according to the present invention, wherein the data processing module 40 is specifically configured to:
the artificial intelligence model is layered by convolution, and when one artificial intelligence model has N layers of convolution operation, the description of each operand in the h header file is an array of discrete [ num ], num = N;
after each layer of characteristics is loaded into a cache, the characteristics are split in a brick form, and after a clock cycle, all the characteristics are input into the computing unit; when the characteristic value is input into the computing unit, preloading a next layer of parameters into a cache;
and storing the model parameters of each operator in each layer trained by the artificial intelligence model in a binary file form according to the channel dimension in sequence.
The edge computation network can be layered by convolution, and if an edge computation network has N layers of convolution (full connection can be regarded as special convolution) operation, the description of each operand in h is an array of discrete [ num ], num = N. Like alexnet, there are 5 layers of convolution, N =5. Each num in the array is a specific description of the current layer parameters.
As described for each layer normalization function in alexnet: CONSTANT boul norm _ enabled [ NUM _ CONVOLUTIONS ] = { true, true, false, false, false }; it can be seen that after the convolution of the first two layers, the calculation needs to be carried out by the normalization function module, and the last three layers do not need to be carried out.
As described by Alexnet for each layer of filters (model convolution parameters): CONSTANT int filter _ full _ size [ NUM _ CONVOLUTIONS ] = {3 × 3,5 × 5,3 × 3}; each num value describes the corresponding filter size for the current layer.
The whole FPGA-based AI acceleration software design adopts a pipeline-filter architecture.
As mentioned above, after each layer of features is loaded into the cache, the features are split in tile form, and after a cycle of clock cycles,
Cycle=(C x W x H)/(Cwec x Wwec x Kwec),
all input to the calculation unit. In order to fully utilize hardware resources, a pipelining mode is used, and the next layer of parameters are preloaded into a cache at the same time when the characteristic values are input into the computing unit. The maximum possible cycle number required is:
Cycle(next)=(K x P x Q)/(Cwec x Wwec x Kwec),
the clock cycle required to complete an operator is therefore
Cycle=if(Cycle>Cycle(next))?Cycle:Cycle(next);
And the host end is responsible for controlling each operator in the FPGA to be called through a Cycle wheel.
And storing the operator model parameters between each layer in a 2-system bin file mode according to the dimensionality of the nchw in sequence.
The edge AI accelerated processing apparatus according to the present invention, after segmenting the edge computing network to obtain segmentation points, includes:
and verifying the chain nodes of the block chain. The user invoking edge acceleration may first pass the on-chain authentication, involving credits, and therefore also requiring authentication of the on-chain nodes of the blockchain.
The edge AI accelerated processing device according to the present invention, wherein before uploading the feature data to the edge platform at the segmentation point, the edge AI accelerated processing device includes:
and judging the network condition, responding to the condition that the rate of the network condition is greater than a preset threshold value, and uploading the characteristic data to the edge platform at the segmentation point. The network condition goodness refers to the communication degree, that is, if the network condition is good, the feature data is uploaded to the edge platform at the segmentation point; and if the network condition goodness is not greater than the preset threshold, directly skipping to the step of increasing the uplink credit without the steps of uploading the characteristic data to the edge platform at the segmentation point and the like.
The edge AI accelerated processing device according to the present invention, wherein before uploading the feature data to the edge platform at the segmentation point, the method includes:
and adjusting the segmentation points according to the server load, and uploading the feature data to the edge platform at the adjusted segmentation points. Load balancing is considered in terms of server load adjustment split points.
The edge AI acceleration processing apparatus according to the present invention, wherein before feeding back the calculation result to the mobile device, the apparatus includes:
and performing credit promotion uplink on the edge computing network based on the computing result.
And improving the credibility of the positive and useful results, wherein the information with high credibility participates in multi-user decision making, the data is transmitted to the edge platform block chain through the edge 5G equipment, and the data is analyzed and fed back to the user equipment for reasoning.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include: a processor (processor) 310, a communication Interface (communication Interface) 320, a memory (memory) 330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform an edge AI accelerated processing method comprising:
s1, constructing an edge computing network of a fusion block chain;
s2, based on the structure of the edge computing network, segmenting the edge computing network to obtain segmentation points;
s3, uploading the feature data to an edge platform at the segmentation point;
and S4, processing the characteristic data based on a preset artificial intelligence model conversion rule and a preset artificial intelligence model conversion tool of the field programmable gate array, and feeding back a calculation result to the mobile equipment.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product including a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when executed by a computer, the computer being capable of executing the edge AI acceleration processing method provided by the above methods, the method including:
s1, constructing an edge computing network of a fusion block chain;
s2, based on the structure of the edge computing network, segmenting the edge computing network to obtain segmentation points;
s3, uploading the feature data to an edge platform at the segmentation point;
and S4, processing the characteristic data based on a preset artificial intelligence model conversion rule and a preset artificial intelligence model conversion tool of the field programmable gate array, and feeding back a calculation result to the mobile equipment.
In still another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the edge AI acceleration processing method provided by the above, the method including:
s1, constructing an edge computing network of a fusion block chain;
s2, based on the structure of the edge computing network, segmenting the edge computing network to obtain segmentation points;
s3, uploading the feature data to an edge platform at the segmentation point;
and S4, processing the characteristic data based on a preset artificial intelligence model conversion rule and a preset artificial intelligence model conversion tool of the field programmable gate array, and feeding back a calculation result to the mobile equipment.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An edge AI acceleration processing method, comprising:
constructing an edge computing network of a fusion block chain;
based on the structure of the edge computing network, segmenting the edge computing network to obtain segmentation points;
uploading the feature data to an edge platform at the segmentation point;
and processing the characteristic data based on a preset artificial intelligence model conversion rule and a preset artificial intelligence model conversion tool of the field programmable gate array, and feeding back a calculation result to the mobile equipment.
2. The edge AI accelerated processing method according to claim 1, wherein the processing of the feature data based on the preset field-programmable gate array-based artificial intelligence model transformation rules and tools includes:
splitting the channel-dimensional characteristics according to the vectors of the first vector and the second vector by adopting a brick technology, inputting the split channel-dimensional characteristics into a plurality of non-blocking parallel processing computing processing units, taking a corresponding brick according to the number of tiles in each clock cycle, and recombining the brick in on-chip cache through a blocking channel to form a characteristic diagram.
3. The edge AI accelerated processing method according to claim 2, wherein the processing of the feature data based on the preset field programmable gate array based artificial intelligence model transformation rules and tools further comprises:
layering the artificial intelligence models by convolution, wherein when one artificial intelligence model has N layers of convolution operation, each operand in the h-header file is described as an array of discrete [ num ], num = N;
after each layer of characteristics is loaded into a cache, the characteristics are split in a brick form and are all input into the computing unit after a clock period; when the characteristic value is input into the computing unit, preloading the next layer of parameters into a cache;
and storing the operator model parameters of each layer trained by the artificial intelligence model in a binary file form according to the channel dimension in sequence.
4. The edge AI accelerated processing method according to claim 1, wherein the segmenting the edge computing network to obtain segmentation points comprises:
and verifying the chain nodes of the block chain.
5. The edge AI acceleration processing method according to claim 1, wherein before uploading the feature data to the edge platform at the segmentation point, the method comprises:
and judging the network condition, responding to the network condition that the rate of success is greater than a preset threshold value, and uploading the characteristic data to the edge platform at the segmentation point.
6. The edge AI acceleration processing method according to claim 1, wherein before uploading the feature data to the edge platform at the segmentation point, the method comprises:
and adjusting the segmentation points according to the server load, and uploading the feature data to the edge platform at the adjusted segmentation points.
7. The edge AI acceleration processing method according to claim 1, wherein before feeding back the calculation result to the mobile device, the method includes:
and performing credit enhancement uplink on the edge computing network based on the computing result.
8. An edge AI acceleration processing apparatus, comprising:
the network construction module is used for constructing an edge computing network of the fusion block chain;
the network segmentation module is used for segmenting the edge computing network based on the structure of the edge computing network to obtain segmentation points;
the data transmission module is used for uploading the characteristic data to the edge platform at the segmentation point;
and the data processing module is used for processing the characteristic data based on a preset artificial intelligence model conversion rule and a preset artificial intelligence model conversion tool of the field programmable gate array and feeding back a calculation result to the mobile equipment.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the edge AI acceleration processing method according to any one of claims 1 to 7 when executing the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the edge AI acceleration processing method according to any one of claims 1 to 7.
CN202110930879.XA 2021-08-13 2021-08-13 Edge AI acceleration processing method and device, electronic equipment and readable storage medium Pending CN115706703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110930879.XA CN115706703A (en) 2021-08-13 2021-08-13 Edge AI acceleration processing method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110930879.XA CN115706703A (en) 2021-08-13 2021-08-13 Edge AI acceleration processing method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN115706703A true CN115706703A (en) 2023-02-17

Family

ID=85180176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110930879.XA Pending CN115706703A (en) 2021-08-13 2021-08-13 Edge AI acceleration processing method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN115706703A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200404270A1 (en) * 2019-06-24 2020-12-24 Tencent America LLC Flexible slice, tile and brick partitioning
CN112384947A (en) * 2017-12-28 2021-02-19 英特尔公司 Visual fog
JP6834097B1 (en) * 2020-05-15 2021-02-24 エッジコーティックス ピーティーイー. リミテッド Hardware-specific partitioning of inference neural network accelerators

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112384947A (en) * 2017-12-28 2021-02-19 英特尔公司 Visual fog
US20200404270A1 (en) * 2019-06-24 2020-12-24 Tencent America LLC Flexible slice, tile and brick partitioning
JP6834097B1 (en) * 2020-05-15 2021-02-24 エッジコーティックス ピーティーイー. リミテッド Hardware-specific partitioning of inference neural network accelerators

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIPING KANG , JOHANN HAUSWALD: "Neurosurgeon: Collaborative IntelligenceBetween the Cloud and Mobile Edg", 《ACM SIGPLAN NOTICES》, 12 August 2017 (2017-08-12) *

Similar Documents

Publication Publication Date Title
Atieh The next generation cloud technologies: a review on distributed cloud, fog and edge computing and their opportunities and challenges
Yang et al. A parallel intelligence-driven resource scheduling scheme for digital twins-based intelligent vehicular systems
CN107515736B (en) Method for accelerating computation speed of deep convolutional network on embedded equipment
US11005857B2 (en) Systems and methods for securing industrial data streams with a fog root of trust
Mendis et al. A blockchain-powered decentralized and secure computing paradigm
CN110874571A (en) Training method and device of face recognition model
CN113537400B (en) Distribution and exit method of edge computing nodes based on branch neural network
Alferaidi et al. Federated learning algorithms to optimize the client and cost selections
Li et al. Sustainable CNN for robotic: An offloading game in the 3D vision computation
Beniiche et al. The way of the DAO: Toward decentralizing the tactile internet
CN116796338A (en) Online deep learning system and method for privacy protection
CN115706703A (en) Edge AI acceleration processing method and device, electronic equipment and readable storage medium
Kovtun et al. Model of functioning of the centralized wireless information ecosystem focused on multimedia streaming
CN116224791A (en) Collaborative training control method for intelligent manufacturing collaborative robot edge system
CN112738225B (en) Edge calculation method based on artificial intelligence
Silva et al. GDLS-FS: scaling feature selection for intrusion detection with GRASP-FS and distributed local search
CN115525921A (en) MPC-based federated learning model training and prediction method, system, device and medium
CN111709784B (en) Method, apparatus, device and medium for generating user retention time
Kaya et al. Communication-efficient zeroth-order distributed online optimization: Algorithm, theory, and applications
Shi et al. Edge-assisted federated learning: An empirical study from software decomposition perspective
CN117171766B (en) Data protection method, system and medium based on deep neural network model
Yu et al. Introduction to Federated Learning
Alemayehu et al. Distributed Edge Computing for DNA-Based Intelligent Services and Applications: A Review
WO2023124312A1 (en) Prediction method and apparatus in joint learning
CN116910161B (en) Collaborative analysis system, collaborative analysis method, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination