CN111090503B - High-cost-performance cloud computing service system based on FPGA chip - Google Patents

High-cost-performance cloud computing service system based on FPGA chip Download PDF

Info

Publication number
CN111090503B
CN111090503B CN201811248254.XA CN201811248254A CN111090503B CN 111090503 B CN111090503 B CN 111090503B CN 201811248254 A CN201811248254 A CN 201811248254A CN 111090503 B CN111090503 B CN 111090503B
Authority
CN
China
Prior art keywords
fpga
module
entering
data
cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811248254.XA
Other languages
Chinese (zh)
Other versions
CN111090503A (en
Inventor
张强
杨付收
赵小吾
龙瞻
田志明
荣义然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xuehu Information Technology Co ltd
Original Assignee
Shanghai Xuehu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xuehu Information Technology Co ltd filed Critical Shanghai Xuehu Information Technology Co ltd
Priority to CN201811248254.XA priority Critical patent/CN111090503B/en
Publication of CN111090503A publication Critical patent/CN111090503A/en
Application granted granted Critical
Publication of CN111090503B publication Critical patent/CN111090503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/549Remote execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a high cost performance cloud computing service system based on an FPGA chip, which belongs to the technical field of computer hardware and software FPGA chips and comprises a cloud storage system, a scheduler module and an FPGA computing unit; the cloud storage system comprises a pod node unit, the cloud storage system is accessed to a scheduler module through the pod node unit, and the pod node unit and the scheduler module are respectively corresponding; the dispatcher module and the FPGA computing unit are corresponding to each other, and the dispatcher module and the FPGA computing unit can perform interaction of data and commands; the scheduling program module is used for receiving data, parameters and commands by the PS end, the invention adopts an arm linux operating system and an FPGA high-cost performance chip to realize a cloud parallel computing module, is an FPGA cloud cluster scheme with low cost, high efficiency and high configurability, and configures private socket services in the arm system and adopts a customized protocol to realize different requests.

Description

High-cost-performance cloud computing service system based on FPGA chip
Technical Field
The invention relates to the technical field of computer hardware and software FPGA chips, in particular to a high-cost performance cloud computing service system based on an FPGA chip.
Background
The existing FPGA chips serving as cloud computing schemes are high-end configuration chips and communicate through a PCIE interface, and the scheme has the defects that the chips are very expensive, and meanwhile, higher configuration of a pc end is required on a PCIE interface, and the configuration is also high in cost; the existing product has high cost, the utilization rate of the FPGA chip is low, the deployment is convenient, the configuration is flexible, the cost of the existing product is very high when the existing product is deployed on a large scale because of the limitation of the cost, and if a high-end chip is used, one server can only make one calculation mode.
Based on the problem, the invention designs a high-cost performance cloud computing service system based on an FPGA chip to solve the problem.
Disclosure of Invention
The invention aims to provide a high-cost-performance cloud computing service system based on an FPGA chip, which aims to solve the problems that the prior product provided in the background art is high in cost, low in utilization rate of the FPGA chip, convenient to deploy and flexible to configure, and the cost of the prior product is very high when the prior product is deployed on a large scale due to cost limitation.
In order to achieve the above purpose, the present invention provides the following technical solutions: a high cost performance cloud computing service system based on an FPGA chip comprises a cloud storage system, a scheduler module and an FPGA computing unit;
the cloud storage system comprises a pod node unit, the cloud storage system is accessed to a scheduler module through the pod node unit, and the pod node unit and the scheduler module are respectively corresponding;
the dispatcher module and the FPGA computing unit are corresponding to each other, and the dispatcher module and the FPGA computing unit can perform interaction of data and commands;
the scheduler module is used for receiving data, parameters and commands by the PS end.
Preferably, the data transmitted by the scheduler module includes FPGA control parameters, weight parameters required by the CNN network, data and operation commands, and the data and operation commands are sequentially transmitted according to the FPGA control parameters, weight parameters required by the CNN network, and the data and operation commands.
Preferably, the scheduler module comprises an entering program distribution module and an entering scheduler module;
the entering program distribution module is used for receiving a request command from the cloud and transmitting a picture cutting pair to the entering scheduling module;
the entering program distribution module comprises an FPGA parallel computation selection unit, and the FPGA parallel computation selection unit is used for selecting whether entering into the program distribution module is needed or not;
the entering scheduling module is used for merging the pictures which are scheduled and segmented and returning the result data to the user in an original way;
the system comprises an entering scheduling module, an entering program distribution module and a storage module, wherein an FPGA computing unit supporting type judging unit is arranged between the entering scheduling module and the entering program distribution module and used for judging whether to enter the entering scheduling module.
Preferably, the entering scheduling module comprises a routing module, the routing module is used for wireless data transmission of the FPGA computing unit, and a network port of an arm system included in the FPGA computing unit corresponds to the routing module.
Preferably, the number of the FPGA computation units is at least one, each group of the FPGA computation units comprises at least 24 groups of independent FPGA chips, and each group of the FPGA chips is embedded in the arm system.
Preferably, when the PS side sends a start command to the PL side, the FPGA chip uses the control parameters previously transmitted to start the calculation.
Compared with the prior art, the invention has the beneficial effects that: the invention adopts an arm linux operating system and an FPGA high-cost performance chip to realize a cloud parallel computing module, is an FPGA cloud cluster scheme with low cost, high efficiency and high configurability, configures private socket service in an arm system, and adopts a customized protocol to realize different requests; the FPGA chip with high cost performance and the micro service provided by the embedded system are directly used for intervening in the cloud computing system through the network cable, so that the problem of high cost is solved, one FPGA server is provided with 24 FPGA high cost performance chips which are independently operated, the cost is 1/10 of that of one GPU server with the same power, and the FPGA server can be configured into different algorithm structures according to requirements and can be operated simultaneously. The micro-service provided by the embedded system mainly refers to running a socket server in the embedded linux system, and processing a calculation request from a client. When the calculation amount of the request calculation is large, the task allocation is carried out by the dispatcher, for example, when the CNN is used for carrying out image processing, the pictures can be cut, the cut pictures are respectively sent to different socket servers, the server can call the FPGA to carry out calculation after receiving the request, the calculation result is returned to the client, and then the client carries out the combination of the pictures. The scheme can greatly save hardware cost, and can be deployed in a cloud only by a switch, a network cable and a high-cost-performance FPGA chip. Meanwhile, the configuration in the cloud is flexible, the FPGA cluster can be conveniently established, and the cluster can be configured with different functions according to the needs of users, such as different network structures for running CNNs (computer network networks) in the cluster, and also can be configured into the network structures of RNNs.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a cloud computing overall framework of the present invention;
FIG. 2 is a diagram of an FPGA computational cell of the present invention;
FIG. 3 is a diagram of a framework of a call FPGA computing module of the present invention;
FIG. 4 is a diagram of an internal computing framework of a computing unit of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-4, the present invention provides a technical solution: a high cost performance cloud computing service system based on an FPGA chip comprises a cloud storage system, a scheduler module and an FPGA computing unit;
the cloud storage system comprises a pod node unit, the cloud storage system is accessed to a scheduler module through the pod node unit, and the pod node unit and the scheduler module are respectively corresponding;
the dispatcher module and the FPGA computing unit are corresponding to each other, and the dispatcher module and the FPGA computing unit can perform interaction of data and commands;
the scheduler module is used for receiving data, parameters and commands by the PS end.
In a further embodiment, the data transmitted by the scheduler module includes FPGA control parameters, weight parameters required by the CNN network, data and operation commands, and the data and operation commands are sequentially transmitted according to the FPGA control parameters, the weight parameters required by the CNN network, and the data and operation commands;
in a further embodiment, the scheduler module includes an entering program distribution module and an entering scheduler module;
the entering program distribution module is used for receiving a request command from the cloud and transmitting a picture cutting pair to the entering scheduling module;
the entering program distribution module comprises an FPGA parallel computation selection unit, and the FPGA parallel computation selection unit is used for selecting whether entering into the program distribution module is needed or not;
the entering scheduling module is used for merging the pictures which are scheduled and segmented and returning the result data to the user in an original way;
the system comprises an entering scheduling module, an entering program distribution module and a program distribution module, wherein an FPGA (field programmable gate array) computing unit supporting type judging unit is arranged between the entering scheduling module and the entering program distribution module and used for judging whether to enter the entering scheduling module;
in a further embodiment, the access scheduling module includes a routing module, the routing module is used for wireless data transmission of the FPGA computing unit, and a network port of the arm system included in the FPGA computing unit corresponds to the routing module;
in a further embodiment, the number of the FPGA computation units is at least one, each group of the FPGA computation units includes at least 24 groups of independent FPGA chips, and each group of FPGA chips is embedded in an arm system;
in a further embodiment, when the PS end sends a start command to the PL end, the FPGA chip uses the control parameters transmitted before to start calculation;
as shown in the overall frame diagram of fig. 1, an FPGA computing unit module is added in the existing k8s system in the cloud, a scheduler is accessed to the pod node of k8s, different pod nodes correspond to different schedulers, each scheduler module corresponds to different FPGA computing units, and when a user request arrives at the cloud, the flow is as described in 2.1.2, and the scheduler module performs data and command interaction with the FPGA computing units. The FPGA computing unit in the scheme exists as an independent functional module in the k8s system, and is easy to deploy.
For a single FPGA computation unit, each FPGA computation unit contains 24 individual FPGA chips, each embedded in an arm system, as shown in figure 2. The computing unit is provided with a routing module which is connected with network ports of all arm systems of the computing unit, thereby ensuring normal communication of the server.
As shown in fig. 3, when a user sends a data request (for example, CNN picture processing) through a terminal, the user sends picture data to a cloud terminal for processing, the cloud terminal determines whether to enter FPGA cluster calculation after receiving the request, if so, first determines whether the type of CNN algorithm of the request is already supported, if not, enters an abnormal processing flow, and if so, invokes a database to load the bit file already supporting the type into the corresponding FPGA cluster. Meanwhile, the method enters a dispatcher, the main task of the dispatcher is to judge the size of a picture and calculate the task, if the picture is large, the picture is cut into relatively small pictures (such as 512 multiplied by 512), meanwhile, the cut pictures are transmitted to an embedded micro service on the FPGA side in parallel, and after the FPGA calculates, data is written into a designated memory address. The micro-service on the Server side returns the data to the dispatcher, and the dispatcher merges the divided pictures and returns the split pictures to the user terminal in the original way.
As shown in the internal calculation flow chart of the arm-end FPGA chip in fig. 4, the PS end receives information such as data/parameters/commands from the scheduler module, and according to the actual function requirement, the PS end transmits FPGA control parameters first, then transmits weight parameters required by the CNN network, and then when transmitting data and operation commands, when the PS end sends start command to the PL (FPGA) end, the FPGA chip starts calculation using the control parameters transmitted before, and the data required in the calculation process is obtained by sharing with the memory of the PS end. After the operation result is finished, the PL will notify the PS end to acquire data. Likewise, data is acquired through shared memory.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (5)

1. A high cost performance cloud computing service system based on an FPGA chip is characterized in that: the system comprises a cloud storage system, a scheduler module and an FPGA computing unit;
the cloud storage system comprises a pod node unit, the cloud storage system is accessed to a scheduler module through the pod node unit, and the pod node unit and the scheduler module are respectively corresponding;
the dispatcher module and the FPGA computing unit are corresponding to each other, and the dispatcher module and the FPGA computing unit can perform interaction of data and commands;
the scheduler module is used for receiving data, parameters and commands by the PS end;
the scheduler module comprises an entering program distribution module and an entering scheduling module;
the entering program distribution module is used for receiving a request command from the cloud and transmitting a picture cutting pair to the entering scheduling module;
the entering program distribution module comprises an FPGA parallel computation selection unit, and the FPGA parallel computation selection unit is used for selecting whether entering into the program distribution module is needed or not;
the entering scheduling module is used for merging the pictures which are scheduled and segmented and returning the result data to the user in an original way;
the system comprises an entering scheduling module, an entering program distribution module and a storage module, wherein an FPGA computing unit supporting type judging unit is arranged between the entering scheduling module and the entering program distribution module and used for judging whether to enter the entering scheduling module.
2. The cost-effective cloud computing service system based on FPGA chips of claim 1, wherein: the data transmitted by the scheduler module comprises FPGA control parameters, weight parameters required by the CNN network, data and operation commands, and the data are sequentially transmitted according to the FPGA control parameters, the weight parameters required by the CNN network, the data and the operation commands.
3. The cost-effective cloud computing service system based on FPGA chips of claim 1, wherein: the entering scheduling module comprises a routing module, wherein the routing module is used for wireless data transmission of the FPGA computing unit, and a network port of an arm system included in the FPGA computing unit corresponds to the routing module.
4. A cost effective cloud computing service system based on FPGA chips as defined in claim 3, wherein: the number of the FPGA computing units is at least one, each group of the FPGA computing units comprises at least 24 groups of independent FPGA chips, and each group of the FPGA chips is embedded in an arm system.
5. The cost-effective cloud computing service system based on FPGA chips as defined in any one of claims 1 and 4, wherein: when the PS side sends a start command to the PL side, the FPGA chip uses the previously transmitted control parameters to start the calculation.
CN201811248254.XA 2018-10-24 2018-10-24 High-cost-performance cloud computing service system based on FPGA chip Active CN111090503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811248254.XA CN111090503B (en) 2018-10-24 2018-10-24 High-cost-performance cloud computing service system based on FPGA chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811248254.XA CN111090503B (en) 2018-10-24 2018-10-24 High-cost-performance cloud computing service system based on FPGA chip

Publications (2)

Publication Number Publication Date
CN111090503A CN111090503A (en) 2020-05-01
CN111090503B true CN111090503B (en) 2023-07-21

Family

ID=70392205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811248254.XA Active CN111090503B (en) 2018-10-24 2018-10-24 High-cost-performance cloud computing service system based on FPGA chip

Country Status (1)

Country Link
CN (1) CN111090503B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228238A (en) * 2016-07-27 2016-12-14 中国科学技术大学苏州研究院 The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN108228354A (en) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 Dispatching method, system, computer equipment and medium
CN108304250A (en) * 2018-03-05 2018-07-20 北京百度网讯科技有限公司 Method and apparatus for the node for determining operation machine learning task
US10108465B1 (en) * 2016-06-23 2018-10-23 EMC IP Holding Company LLC Automated cloud service evaluation and workload migration utilizing standardized virtual service units

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10108465B1 (en) * 2016-06-23 2018-10-23 EMC IP Holding Company LLC Automated cloud service evaluation and workload migration utilizing standardized virtual service units
CN106228238A (en) * 2016-07-27 2016-12-14 中国科学技术大学苏州研究院 The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN108228354A (en) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 Dispatching method, system, computer equipment and medium
CN108304250A (en) * 2018-03-05 2018-07-20 北京百度网讯科技有限公司 Method and apparatus for the node for determining operation machine learning task

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Kaiyuan Guo.Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA.《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 》.2017,全文. *
朱金升.基于FPGA的无人机航拍图像特定目标识别技术应用研究.《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》.2018,(第01期),全文. *
树岸 ; 彭鑫 ; 赵文耘 ; .基于容器技术的云计算资源自适应管理方法.计算机科学.2017,(第07期),全文. *

Also Published As

Publication number Publication date
CN111090503A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
CN107087019B (en) Task scheduling method and device based on end cloud cooperative computing architecture
CN102447624B (en) Load balancing method in server cluster, as well as node server and cluster
CN110602156A (en) Load balancing scheduling method and device
US10609125B2 (en) Method and system for transmitting communication data
CN112631788B (en) Data transmission method and data transmission server
CN101442493A (en) Method for distributing IP message, cluster system and load equalizer
CN110830574B (en) Method for realizing intranet load balance based on docker container
CN105472291A (en) Digital video recorder with multiprocessor cluster and realization method of digital video recorder
WO2021120633A1 (en) Load balancing method and related device
CN103441937A (en) Sending method and receiving method of multicast data
WO2017050036A1 (en) Resource allocation information transmission and data distribution method and device
US11736403B2 (en) Systems and methods for enhanced autonegotiation
EP3631639B1 (en) Communications for field programmable gate array device
CN109245926A (en) Intelligent network adapter, intelligent network adapter system and control method
CN114710571B (en) Data packet processing system
WO2013189069A1 (en) Load sharing method and device, and single board
CN104104736A (en) Cloud server and use method thereof
CN112422251B (en) Data transmission method and device, terminal and storage medium
CN111090503B (en) High-cost-performance cloud computing service system based on FPGA chip
CN109525443B (en) processing method and device for distributed pre-acquisition communication link and computer equipment
CN111147603A (en) Method and device for networking reasoning service
CN109831467B (en) Data transmission method, equipment and system
CN111245878A (en) Method for computing and offloading communication network based on hybrid cloud computing and fog computing
CN110166368B (en) Cloud storage network bandwidth control system and method
US11303524B2 (en) Network bandwidth configuration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant