CN208013975U - The hardware device of on-line intelligence ability platform - Google Patents

The hardware device of on-line intelligence ability platform Download PDF

Info

Publication number
CN208013975U
CN208013975U CN201820583759.0U CN201820583759U CN208013975U CN 208013975 U CN208013975 U CN 208013975U CN 201820583759 U CN201820583759 U CN 201820583759U CN 208013975 U CN208013975 U CN 208013975U
Authority
CN
China
Prior art keywords
servers
gpu
server
storage server
hardware device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201820583759.0U
Other languages
Chinese (zh)
Inventor
李宇歌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUZHOU CHAOJI INFORMATION TECHNOLOGY CO LTD
Original Assignee
SUZHOU CHAOJI INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUZHOU CHAOJI INFORMATION TECHNOLOGY CO LTD filed Critical SUZHOU CHAOJI INFORMATION TECHNOLOGY CO LTD
Priority to CN201820583759.0U priority Critical patent/CN208013975U/en
Application granted granted Critical
Publication of CN208013975U publication Critical patent/CN208013975U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The utility model is related to the hardware devices of on-line intelligence ability platform, storage server connects Ethernet by external access network, storage server, P4 GPU servers, P100 GPU servers and management server are by calculating network connection Inifiniband interchangers, storage server, P4 GPU servers, P100 GPU servers and management server connect Ethernet switch with computer room control centre by managing network connection Ethernet switch, working group by Internal Access Network.It can be shared using video memory in Infiniband real-time performance physical significances by NVIDIA RDMA technologies and GPU Director technologies;When large data calculates analysis, multiple servers and more GPU is called to complete jointly, when needing to complete diversiform data calculating analysis, multiple tasks is distributed into different server, realize the concurrently progress of multi-model training.

Description

The hardware device of on-line intelligence ability platform
Technical field
The utility model is related to a kind of hardware devices of on-line intelligence ability platform.
Background technology
The pattern that neural network can identify is numeric form, therefore all are existing for image, sound, text, time series etc. The data in the real world must be converted into numerical value.In deep learning network, each node layer is on the basis of preceding layer exports Study one group of specific feature of identification.As neural network depth increases, node can know another characteristic and also just become increasingly complex, Because of the feature of the whole merger and reorganization preceding layer of each layer of meeting.
Utility model content
The purpose of the utility model is to overcome the shortcomings of the prior art, a kind of the hard of on-line intelligence ability platform is provided Part equipment.
The purpose of this utility model is achieved through the following technical solutions:
The hardware device of on-line intelligence ability platform, feature are:Including storage server, P4GPU servers, P100GPU Server, management server, Inifiniband interchangers and Ethernet switch, the storage server are connect by outside Enter network connection Ethernet, the storage server, P4GPU servers, P100GPU servers and management server pass through calculating Network connection Inifiniband interchangers, the storage server, P4GPU servers, P100GPU servers and management service Device connects Ethernet with computer room control centre by managing network connection Ethernet switch, working group by Internal Access Network Interchanger.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the management server is XG- 22302EN servers.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the P4GPU servers are PSC- HB1X 4U machine towers mutually turn server.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the P100GPU servers are XG- 48201GK servers.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the storage server is XG- 42301STStorage server.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the Inifiniband interchangers For 108 port InfiniBand interchangers of SX6506.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the Ethernet switch is 24 mouthfuls Gigabit switch.
The utility model has significant advantages and beneficial effects compared with prior art, embodies in the following areas:
The P4GPU server nodes and P100GPU server nodes of the hardware device can pass through NVIDIA RDMA technologies And GPU Director technologies, it is shared using the video memory in Infiniband real-time performance physical significances;In large data meter When point counting is analysed, multiple servers and more GPU can be called to complete task jointly, and be analyzed when needing completion diversiform data to calculate When, multiple tasks can be distributed to different servers according to the characteristic of different types of data, realize multi-model training and Hair carries out;Training flow, initiates training mission by worker, to after management node, computing resource, GPU is asked by management node Cluster is undergone training after task, is trained locally after reading training data to storage server, data back arrives after the completion Storage server, while management server node is fed back information to, management server node prompts training mission to worker It completes.
Description of the drawings
Fig. 1:The configuration diagram of the utility model.
The meaning of each reference numeral see the table below in figure:
Specific implementation mode
For a clearer understanding of the technical features, objectives and effects of the utility model, existing be described in detail specifically Embodiment.
As shown in Figure 1, the hardware device of on-line intelligence ability platform, including storage server 2, P4GPU servers 6, P100GPU servers 3, management server 5, Inifiniband interchangers 4 and Ethernet switch 7, storage server 2 are logical Cross external access network connection Ethernet 2, storage server 2, P4GPU servers 6, P100GPU servers 3 and management server 5 connect Inifiniband interchangers 4 by calculating network (data transmission network), storage server 2, P4GPU servers 6, P100GPU servers 3 and management server 5 are by managing network connection Ethernet switch 7, during working group 9 and computer room control The heart 8 connects Ethernet switch 7 by Internal Access Network.
Wherein, management server 5 is XG-22302EN servers.The server space of 2U, 8 3.5 cun of hot-plug hard disks Two-way E5Xeon 2600V4 series CPU, 16 root memory slots, 38 slots of PCI-E 3.0X and 3 PCI-E are supported in position 3.0X16 slot.Meet extension demand, system and data call disk are used as using the 480G SSD (RAID1) of 2 pieces of enterprise-levels, adopted The 4TB HDD (RAID10) of 4 pieces of enterprise-levels are used to be protected as data disks, LSI9271 RAID cards and capacitance data.Ensure data Safety and storage performance are excellent.Network facet is integrated dual port 1Gb network interface cards and IB 56Gb network interface cards.740W 1+1 redundant powers are protected The electrical stability for demonstrate,proving machine long-play, ensures the safety of data.
P4GPU servers 6 are that PSC-HB1X 4U machine towers mutually turn server.Data as entire cluster calculate center one Part carries large-scale concurrent, has a large amount of linear computing capability, for the high-performance calculation clothes of unique radiating treatment Business device, can support the GPU processors of mainstream on the market.The operation is stable, the design of full redundancy server level, machine tower mutually turn appearance, height Extending space.PSC-HB1X can install 4 pieces of GPU calculating cards, install NVIDIA TESLA P4 high-performance calculation cards, provide list Accuracy computation.CPU uses the CPU of two-way E5-2650v4, calculate node to configure 256G memories, and network facet is integrated dual port 1Gb Network interface card, IB 56Gb network interface cards.3 pieces of 8TB enterprise-level mechanical hard disks are configured in terms of storage, LSI9271 RAID cards and capacitance data are protected Shield, RAID5 configurations.Power supply is 2000W redundancy 1+1 80PLUS platinum level power supplies.
P100GPU servers 3 are XG-48201GK servers.Data as entire cluster calculate center another part, Double precision computing capability is provided, high density GPU deployment can be adopted in 8 P100 high-performance calculation cards of space configuration of 4U, CPU With the CPU of E5-2650v4, calculate node configures 256G memories, and network facet is integrated dual port 1Gb network interface cards, IB 56Gb network interface cards. Power supply is 1600W 2+2 redundant powers, being capable of flexible modulation power modes.
Storage server 2 is XG-42301STStorage server.It is the key component of entire cluster-based storage data, each GPU Node reads data and is intended to through this node, 24 pieces of 3.5 cun of hard disks of 4U space configurations, and CPU uses the CPU of two-way E5-2620v4, Calculate node configures 64G memories, and network facet is integrated dual port 1Gb network interface cards, IB 56Gb network interface cards.Each hard-disk capacity is up to 8TB, while RAID50 disk arrays, more preferably protect data safety.
Inifiniband interchangers 4 are 108 port InfiniBand interchangers of SX6506.Mellanox SX6506 are handed over System of changing planes provides the Networking Solutions & provisioned of peak performance, and in the space of 6U, SX6506, which is provided, is up to the without hindrance of 12.1Tb/s Postpone between the port of plug bandwidth and 170ns to 510ns, using Mellanox the 6th generation SwitchX-2 chips SX6506InfiniBand interchangers, possess 108 ports, and each port can provide the complete two-way bandwidth of 56Gb/s.SX6506 Growth of the function with computing cluster number of nodes is exchanged, realizes extension on demand.To be medium-sized high sexual valence is provided to ultra-large type cluster The interconnection scheme of ratio, while being also equipped with high availability and reliability comparable to core stage of switches.In addition, impeller, blade and pipe Managing module and power supply and blower can help to shorten downtime, subnet management built in SX6506 interchangers with hot plug Device realizes the out-of-the-box for being up to 648 meshed networks.
Ethernet switch 7 is 24 mouthfuls of gigabit switch, accesses all nodes, as management interchanger.
It should be noted that considering the reasonability distribution of task, all calculating tasks when building High Performance Computing Cluster It is controlled by master server, realizes management node, the session of calculate node;In the fields such as deep learning, because of the data being related to It is huge and complicated, need, the calculate node of calling different number grade different according to the demand of Data;And it is directed to different type Data, the use environment of calculate node is also different.
However, the P4GPU server nodes and P100GPU server nodes of above-mentioned hardware device, can pass through NVIDIARDMA technologies and GPU Director technologies, it is total using the video memory in Infiniband real-time performance physical significances It enjoys;When large data is calculated and analyzed, multiple servers and more GPU can be called to complete task jointly, and it is more when needing to complete When categorical data calculates analysis, multiple tasks can be distributed to different servers according to the characteristic of different types of data, it is real The concurrently progress of existing multi-model training;Training flow, initiates training mission, to after management node, by management node by worker It asks computing resource, GPU cluster to be undergone training after task, is trained locally after reading training data to storage server, Data back is to storage server after the completion, while feeding back information to management server node, and management server node is to work Author prompts training mission to complete.
It should be noted that:Above description is merely a prefered embodiment of the utility model, is not limited to this practicality Novel interest field;Above description simultaneously, should can be illustrated and implement for the special personage of correlative technology field, therefore its It should be included in claim without departing from the lower equivalent change or modification completed of the revealed spirit of the utility model In.

Claims (7)

1. the hardware device of on-line intelligence ability platform, it is characterised in that:Including storage server, P4 GPU servers, P100 GPU servers, management server, Inifiniband interchangers and Ethernet switch, the storage server pass through outside Network connection Ethernet is accessed, the storage server, P4 GPU servers, P100 GPU servers and management server pass through Calculate network connection Inifiniband interchangers, the storage server, P4 GPU servers, P100 GPU servers and pipe It manages server and is connected by Internal Access Network with computer room control centre by managing network connection Ethernet switch, working group Ethernet switch.
2. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The management server For XG-22302EN servers.
3. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The P4 GPU services Device is that PSC-HB1X 4U machine towers mutually turn server.
4. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The P100 GPU clothes Business device is XG-48201GK servers.
5. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The storage server For XG-42301STStorage server.
6. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:It is described Inifiniband interchangers are 108 port InfiniBand interchangers of SX6506.
7. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The Ethernet exchanging Machine is 24 mouthfuls of gigabit switch.
CN201820583759.0U 2018-04-23 2018-04-23 The hardware device of on-line intelligence ability platform Active CN208013975U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201820583759.0U CN208013975U (en) 2018-04-23 2018-04-23 The hardware device of on-line intelligence ability platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201820583759.0U CN208013975U (en) 2018-04-23 2018-04-23 The hardware device of on-line intelligence ability platform

Publications (1)

Publication Number Publication Date
CN208013975U true CN208013975U (en) 2018-10-26

Family

ID=63893366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201820583759.0U Active CN208013975U (en) 2018-04-23 2018-04-23 The hardware device of on-line intelligence ability platform

Country Status (1)

Country Link
CN (1) CN208013975U (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062929A (en) * 2018-06-11 2018-12-21 上海交通大学 A kind of query task communication means and system
WO2020199560A1 (en) * 2019-04-03 2020-10-08 华为技术有限公司 Ai training network and method
WO2021063026A1 (en) * 2019-09-30 2021-04-08 华为技术有限公司 Inference service networking method and apparatus
CN113315794A (en) * 2020-02-26 2021-08-27 宝山钢铁股份有限公司 Hardware architecture of computing system network for online intelligent analysis of blast furnace production

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062929A (en) * 2018-06-11 2018-12-21 上海交通大学 A kind of query task communication means and system
CN109062929B (en) * 2018-06-11 2020-11-06 上海交通大学 Query task communication method and system
WO2020199560A1 (en) * 2019-04-03 2020-10-08 华为技术有限公司 Ai training network and method
WO2021063026A1 (en) * 2019-09-30 2021-04-08 华为技术有限公司 Inference service networking method and apparatus
CN113315794A (en) * 2020-02-26 2021-08-27 宝山钢铁股份有限公司 Hardware architecture of computing system network for online intelligent analysis of blast furnace production

Similar Documents

Publication Publication Date Title
CN208013975U (en) The hardware device of on-line intelligence ability platform
CN104917843B (en) Cloud storage and medical image seamless interfacing system
CN104135514B (en) Fusion type virtual storage system
CN102404201B (en) Method of realizing maximum bandwidth of Lustre concurrent file system
Shipman et al. The spider center wide file system: From concept to reality
CN102625608A (en) Design method for large-scale multiple-node server cabinets
US11102907B2 (en) Serviceability of a networking device with orthogonal switch bars
CN104951024B (en) A kind of large data all-in-one machine based on electric power application
CN105159617A (en) Pooled storage system framework
CN106919533B (en) 4U high-density storage type server
CN207764844U (en) A kind of data processing system
CN106814976A (en) Cluster storage system and apply its data interactive method
US11055252B1 (en) Modular hardware acceleration device
CN206649427U (en) A kind of server architecture for including dual control storage system
CN103677097B (en) Server rack system and server
CN107729200A (en) The method of testing and relevant apparatus of a kind of performance of storage system
CN108090011A (en) A kind of SAS Switch controllers extension framework and design method
CN206649421U (en) A kind of all-in-one machine structure
CN102799708B (en) Graphic processing unit (GPU) high-performance calculation platform device applied to electromagnetic simulation
CN205015812U (en) Big data all -in -one and rack based on electric power is used
CN204965251U (en) All -in -one device based on power equipment monitoring
CN106528463A (en) Four-subnode star server system capable of realizing hard disk sharing
CN206649424U (en) A kind of VHD green node server
CN206649422U (en) A kind of central processing unit hot-plug construction
CN113741642A (en) High-density GPU server

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant