CN208013975U - The hardware device of on-line intelligence ability platform - Google Patents
The hardware device of on-line intelligence ability platform Download PDFInfo
- Publication number
- CN208013975U CN208013975U CN201820583759.0U CN201820583759U CN208013975U CN 208013975 U CN208013975 U CN 208013975U CN 201820583759 U CN201820583759 U CN 201820583759U CN 208013975 U CN208013975 U CN 208013975U
- Authority
- CN
- China
- Prior art keywords
- servers
- gpu
- server
- storage server
- hardware device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The utility model is related to the hardware devices of on-line intelligence ability platform, storage server connects Ethernet by external access network, storage server, P4 GPU servers, P100 GPU servers and management server are by calculating network connection Inifiniband interchangers, storage server, P4 GPU servers, P100 GPU servers and management server connect Ethernet switch with computer room control centre by managing network connection Ethernet switch, working group by Internal Access Network.It can be shared using video memory in Infiniband real-time performance physical significances by NVIDIA RDMA technologies and GPU Director technologies;When large data calculates analysis, multiple servers and more GPU is called to complete jointly, when needing to complete diversiform data calculating analysis, multiple tasks is distributed into different server, realize the concurrently progress of multi-model training.
Description
Technical field
The utility model is related to a kind of hardware devices of on-line intelligence ability platform.
Background technology
The pattern that neural network can identify is numeric form, therefore all are existing for image, sound, text, time series etc.
The data in the real world must be converted into numerical value.In deep learning network, each node layer is on the basis of preceding layer exports
Study one group of specific feature of identification.As neural network depth increases, node can know another characteristic and also just become increasingly complex,
Because of the feature of the whole merger and reorganization preceding layer of each layer of meeting.
Utility model content
The purpose of the utility model is to overcome the shortcomings of the prior art, a kind of the hard of on-line intelligence ability platform is provided
Part equipment.
The purpose of this utility model is achieved through the following technical solutions:
The hardware device of on-line intelligence ability platform, feature are:Including storage server, P4GPU servers, P100GPU
Server, management server, Inifiniband interchangers and Ethernet switch, the storage server are connect by outside
Enter network connection Ethernet, the storage server, P4GPU servers, P100GPU servers and management server pass through calculating
Network connection Inifiniband interchangers, the storage server, P4GPU servers, P100GPU servers and management service
Device connects Ethernet with computer room control centre by managing network connection Ethernet switch, working group by Internal Access Network
Interchanger.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the management server is XG-
22302EN servers.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the P4GPU servers are PSC-
HB1X 4U machine towers mutually turn server.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the P100GPU servers are XG-
48201GK servers.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the storage server is XG-
42301STStorage server.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the Inifiniband interchangers
For 108 port InfiniBand interchangers of SX6506.
Further, the hardware device of above-mentioned on-line intelligence ability platform, wherein the Ethernet switch is 24 mouthfuls
Gigabit switch.
The utility model has significant advantages and beneficial effects compared with prior art, embodies in the following areas:
The P4GPU server nodes and P100GPU server nodes of the hardware device can pass through NVIDIA RDMA technologies
And GPU Director technologies, it is shared using the video memory in Infiniband real-time performance physical significances;In large data meter
When point counting is analysed, multiple servers and more GPU can be called to complete task jointly, and be analyzed when needing completion diversiform data to calculate
When, multiple tasks can be distributed to different servers according to the characteristic of different types of data, realize multi-model training and
Hair carries out;Training flow, initiates training mission by worker, to after management node, computing resource, GPU is asked by management node
Cluster is undergone training after task, is trained locally after reading training data to storage server, data back arrives after the completion
Storage server, while management server node is fed back information to, management server node prompts training mission to worker
It completes.
Description of the drawings
Fig. 1:The configuration diagram of the utility model.
The meaning of each reference numeral see the table below in figure:
Specific implementation mode
For a clearer understanding of the technical features, objectives and effects of the utility model, existing be described in detail specifically
Embodiment.
As shown in Figure 1, the hardware device of on-line intelligence ability platform, including storage server 2, P4GPU servers 6,
P100GPU servers 3, management server 5, Inifiniband interchangers 4 and Ethernet switch 7, storage server 2 are logical
Cross external access network connection Ethernet 2, storage server 2, P4GPU servers 6, P100GPU servers 3 and management server
5 connect Inifiniband interchangers 4 by calculating network (data transmission network), storage server 2, P4GPU servers 6,
P100GPU servers 3 and management server 5 are by managing network connection Ethernet switch 7, during working group 9 and computer room control
The heart 8 connects Ethernet switch 7 by Internal Access Network.
Wherein, management server 5 is XG-22302EN servers.The server space of 2U, 8 3.5 cun of hot-plug hard disks
Two-way E5Xeon 2600V4 series CPU, 16 root memory slots, 38 slots of PCI-E 3.0X and 3 PCI-E are supported in position
3.0X16 slot.Meet extension demand, system and data call disk are used as using the 480G SSD (RAID1) of 2 pieces of enterprise-levels, adopted
The 4TB HDD (RAID10) of 4 pieces of enterprise-levels are used to be protected as data disks, LSI9271 RAID cards and capacitance data.Ensure data
Safety and storage performance are excellent.Network facet is integrated dual port 1Gb network interface cards and IB 56Gb network interface cards.740W 1+1 redundant powers are protected
The electrical stability for demonstrate,proving machine long-play, ensures the safety of data.
P4GPU servers 6 are that PSC-HB1X 4U machine towers mutually turn server.Data as entire cluster calculate center one
Part carries large-scale concurrent, has a large amount of linear computing capability, for the high-performance calculation clothes of unique radiating treatment
Business device, can support the GPU processors of mainstream on the market.The operation is stable, the design of full redundancy server level, machine tower mutually turn appearance, height
Extending space.PSC-HB1X can install 4 pieces of GPU calculating cards, install NVIDIA TESLA P4 high-performance calculation cards, provide list
Accuracy computation.CPU uses the CPU of two-way E5-2650v4, calculate node to configure 256G memories, and network facet is integrated dual port 1Gb
Network interface card, IB 56Gb network interface cards.3 pieces of 8TB enterprise-level mechanical hard disks are configured in terms of storage, LSI9271 RAID cards and capacitance data are protected
Shield, RAID5 configurations.Power supply is 2000W redundancy 1+1 80PLUS platinum level power supplies.
P100GPU servers 3 are XG-48201GK servers.Data as entire cluster calculate center another part,
Double precision computing capability is provided, high density GPU deployment can be adopted in 8 P100 high-performance calculation cards of space configuration of 4U, CPU
With the CPU of E5-2650v4, calculate node configures 256G memories, and network facet is integrated dual port 1Gb network interface cards, IB 56Gb network interface cards.
Power supply is 1600W 2+2 redundant powers, being capable of flexible modulation power modes.
Storage server 2 is XG-42301STStorage server.It is the key component of entire cluster-based storage data, each GPU
Node reads data and is intended to through this node, 24 pieces of 3.5 cun of hard disks of 4U space configurations, and CPU uses the CPU of two-way E5-2620v4,
Calculate node configures 64G memories, and network facet is integrated dual port 1Gb network interface cards, IB 56Gb network interface cards.Each hard-disk capacity is up to
8TB, while RAID50 disk arrays, more preferably protect data safety.
Inifiniband interchangers 4 are 108 port InfiniBand interchangers of SX6506.Mellanox SX6506 are handed over
System of changing planes provides the Networking Solutions & provisioned of peak performance, and in the space of 6U, SX6506, which is provided, is up to the without hindrance of 12.1Tb/s
Postpone between the port of plug bandwidth and 170ns to 510ns, using Mellanox the 6th generation SwitchX-2 chips
SX6506InfiniBand interchangers, possess 108 ports, and each port can provide the complete two-way bandwidth of 56Gb/s.SX6506
Growth of the function with computing cluster number of nodes is exchanged, realizes extension on demand.To be medium-sized high sexual valence is provided to ultra-large type cluster
The interconnection scheme of ratio, while being also equipped with high availability and reliability comparable to core stage of switches.In addition, impeller, blade and pipe
Managing module and power supply and blower can help to shorten downtime, subnet management built in SX6506 interchangers with hot plug
Device realizes the out-of-the-box for being up to 648 meshed networks.
Ethernet switch 7 is 24 mouthfuls of gigabit switch, accesses all nodes, as management interchanger.
It should be noted that considering the reasonability distribution of task, all calculating tasks when building High Performance Computing Cluster
It is controlled by master server, realizes management node, the session of calculate node;In the fields such as deep learning, because of the data being related to
It is huge and complicated, need, the calculate node of calling different number grade different according to the demand of Data;And it is directed to different type
Data, the use environment of calculate node is also different.
However, the P4GPU server nodes and P100GPU server nodes of above-mentioned hardware device, can pass through
NVIDIARDMA technologies and GPU Director technologies, it is total using the video memory in Infiniband real-time performance physical significances
It enjoys;When large data is calculated and analyzed, multiple servers and more GPU can be called to complete task jointly, and it is more when needing to complete
When categorical data calculates analysis, multiple tasks can be distributed to different servers according to the characteristic of different types of data, it is real
The concurrently progress of existing multi-model training;Training flow, initiates training mission, to after management node, by management node by worker
It asks computing resource, GPU cluster to be undergone training after task, is trained locally after reading training data to storage server,
Data back is to storage server after the completion, while feeding back information to management server node, and management server node is to work
Author prompts training mission to complete.
It should be noted that:Above description is merely a prefered embodiment of the utility model, is not limited to this practicality
Novel interest field;Above description simultaneously, should can be illustrated and implement for the special personage of correlative technology field, therefore its
It should be included in claim without departing from the lower equivalent change or modification completed of the revealed spirit of the utility model
In.
Claims (7)
1. the hardware device of on-line intelligence ability platform, it is characterised in that:Including storage server, P4 GPU servers, P100
GPU servers, management server, Inifiniband interchangers and Ethernet switch, the storage server pass through outside
Network connection Ethernet is accessed, the storage server, P4 GPU servers, P100 GPU servers and management server pass through
Calculate network connection Inifiniband interchangers, the storage server, P4 GPU servers, P100 GPU servers and pipe
It manages server and is connected by Internal Access Network with computer room control centre by managing network connection Ethernet switch, working group
Ethernet switch.
2. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The management server
For XG-22302EN servers.
3. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The P4 GPU services
Device is that PSC-HB1X 4U machine towers mutually turn server.
4. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The P100 GPU clothes
Business device is XG-48201GK servers.
5. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The storage server
For XG-42301STStorage server.
6. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:It is described
Inifiniband interchangers are 108 port InfiniBand interchangers of SX6506.
7. the hardware device of on-line intelligence ability platform according to claim 1, it is characterised in that:The Ethernet exchanging
Machine is 24 mouthfuls of gigabit switch.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201820583759.0U CN208013975U (en) | 2018-04-23 | 2018-04-23 | The hardware device of on-line intelligence ability platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201820583759.0U CN208013975U (en) | 2018-04-23 | 2018-04-23 | The hardware device of on-line intelligence ability platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN208013975U true CN208013975U (en) | 2018-10-26 |
Family
ID=63893366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201820583759.0U Active CN208013975U (en) | 2018-04-23 | 2018-04-23 | The hardware device of on-line intelligence ability platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN208013975U (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109062929A (en) * | 2018-06-11 | 2018-12-21 | 上海交通大学 | A kind of query task communication means and system |
WO2020199560A1 (en) * | 2019-04-03 | 2020-10-08 | 华为技术有限公司 | Ai training network and method |
WO2021063026A1 (en) * | 2019-09-30 | 2021-04-08 | 华为技术有限公司 | Inference service networking method and apparatus |
CN113315794A (en) * | 2020-02-26 | 2021-08-27 | 宝山钢铁股份有限公司 | Hardware architecture of computing system network for online intelligent analysis of blast furnace production |
-
2018
- 2018-04-23 CN CN201820583759.0U patent/CN208013975U/en active Active
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109062929A (en) * | 2018-06-11 | 2018-12-21 | 上海交通大学 | A kind of query task communication means and system |
CN109062929B (en) * | 2018-06-11 | 2020-11-06 | 上海交通大学 | Query task communication method and system |
WO2020199560A1 (en) * | 2019-04-03 | 2020-10-08 | 华为技术有限公司 | Ai training network and method |
WO2021063026A1 (en) * | 2019-09-30 | 2021-04-08 | 华为技术有限公司 | Inference service networking method and apparatus |
CN113315794A (en) * | 2020-02-26 | 2021-08-27 | 宝山钢铁股份有限公司 | Hardware architecture of computing system network for online intelligent analysis of blast furnace production |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN208013975U (en) | The hardware device of on-line intelligence ability platform | |
CN104917843B (en) | Cloud storage and medical image seamless interfacing system | |
CN104135514B (en) | Fusion type virtual storage system | |
CN102404201B (en) | Method of realizing maximum bandwidth of Lustre concurrent file system | |
Shipman et al. | The spider center wide file system: From concept to reality | |
CN102625608A (en) | Design method for large-scale multiple-node server cabinets | |
US11102907B2 (en) | Serviceability of a networking device with orthogonal switch bars | |
CN104951024B (en) | A kind of large data all-in-one machine based on electric power application | |
CN105159617A (en) | Pooled storage system framework | |
CN106919533B (en) | 4U high-density storage type server | |
CN207764844U (en) | A kind of data processing system | |
CN106814976A (en) | Cluster storage system and apply its data interactive method | |
US11055252B1 (en) | Modular hardware acceleration device | |
CN206649427U (en) | A kind of server architecture for including dual control storage system | |
CN103677097B (en) | Server rack system and server | |
CN107729200A (en) | The method of testing and relevant apparatus of a kind of performance of storage system | |
CN108090011A (en) | A kind of SAS Switch controllers extension framework and design method | |
CN206649421U (en) | A kind of all-in-one machine structure | |
CN102799708B (en) | Graphic processing unit (GPU) high-performance calculation platform device applied to electromagnetic simulation | |
CN205015812U (en) | Big data all -in -one and rack based on electric power is used | |
CN204965251U (en) | All -in -one device based on power equipment monitoring | |
CN106528463A (en) | Four-subnode star server system capable of realizing hard disk sharing | |
CN206649424U (en) | A kind of VHD green node server | |
CN206649422U (en) | A kind of central processing unit hot-plug construction | |
CN113741642A (en) | High-density GPU server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GR01 | Patent grant | ||
GR01 | Patent grant |