Invention content
It is larger in order to solve the storage of the computation burden of calculate node and memory burden, and data transmission delay higher is asked
Topic, the present invention provides a kind of all-in-one machines.The technical solution is as follows:
In a first aspect, providing a kind of all-in-one machine, the all-in-one machine includes:
Multiple calculate nodes, multiple buffer units, low delay LL switching matrixs unit and input/output interface;
The multiple calculate node is connect with the multiple buffer unit, the LL switching matrixs unit respectively with it is described more
A buffer unit, input/output interface connection;
Wherein, the input/output interface is used for transmission inputoutput data, and the buffer unit is used for memory buffers one
Cause property data, and the calculate node is transmitted to after the buffer consistency data are pre-processed, the LL switching matrixs
Unit, for carrying out the conversion of the inputoutput data and the buffer consistency data.
With reference to first aspect, the first can in realization mode,
Each calculate node is delayed by external external high speed input and output universal serial bus PCIe channels with corresponding
Memory cell connects.
With reference to first aspect, in second of achievable mode,
The all-in-one machine includes:PCIe crosspoints,
The PCIe crosspoints connect the multiple calculate node and the multiple buffer unit respectively, for carrying out institute
State the conversion for the PCIe data that buffer consistency data are supported with the multiple calculate node.
With reference to first aspect to second of achievable mode, the third can in realization mode,
The buffer unit includes cache controller, and the cache controller is used to carry out the buffer consistency data
Pretreatment.
Mode can be realized with reference to the third, in the 4th kind of achievable mode,
The cache controller is made of Advanced Reduced Instruction Set machine ARM, and data friendship can be carried out between multiple ARM
Mutually, each cache controller includes multiple memory bars.
With reference to the 4th kind of achievable mode, in the 5th kind of achievable mode,
The memory bar is dual inline memory module DIMM.
Mode can be realized with reference to the third, in the 6th kind of achievable mode,
In on-site programmable gate array FPGA unit, Reduced Instruction Set Computer Risc and application-specific integrated circuit Asic
At least one.
With reference to first aspect, in the 7th kind of achievable mode,
The pretreatment includes:Correction process unpacks at least one of processing and package processing.
The present invention provides a kind of all-in-one machine, since multiple calculate nodes being connect, and pass through LL with multiple buffer units
Switching matrix unit is connect respectively with multiple buffer units, input/output interface so that inputoutput data can be handed over by LL
It changes matrix unit and is converted to buffer consistency data, buffer consistency data are transmitted to after being pre-processed using buffer unit
Calculate node, therefore all-in-one machine can receive LL switching matrixs in the framework of buffer consistency by plug-in buffer unit
The inputoutput data of cell processing, so as to reduce the storage of the computation burden of calculate node and memory burden, and data pass
Defeated delay is relatively low.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
The embodiment of the present invention provides a kind of all-in-one machine, as shown in Figure 1, the all-in-one machine includes:Multiple calculate nodes (Ser)
01, multiple buffer units 02, low delay (English:Low Latency;Referred to as:LL) switching matrix unit 03 and input and output connect
Mouth 04.
Multiple calculate nodes 01 are connect with multiple buffer units 02, LL switching matrixs unit 03 respectively with multiple buffer units
02nd, input/output interface 04 connects.
Wherein, input/output interface 04 is used for transmission inputoutput data, and buffer unit 02 is used for memory buffers consistency
Data, and calculate node 01 is transmitted to after buffer consistency data are pre-processed, LL switching matrixs unit 03, for carrying out
Inputoutput data and the conversion of buffer consistency data.
Wherein, LL switching matrixs unit be a kind of a kind of high frequency in multinuclear processing chip, high bandwidth interconnection structure,
With low delay characteristic.
In conclusion all-in-one machine provided in an embodiment of the present invention, since multiple calculate nodes and multiple buffer units being connected
It connects, and passes through LL switching matrixs unit and connect respectively with multiple buffer units, input/output interface so that inputoutput data can
Using by LL switching matrixs cell translation, as buffer consistency data, buffer consistency data are located in advance using buffer unit
Calculate node is transmitted to after reason, therefore all-in-one machine can be received in the framework of buffer consistency by plug-in buffer unit
The inputoutput data of LL switching matrix cell processings, so as to which the storage for reducing the computation burden of calculate node and memory is born
Load, and the delay of data transmission is relatively low.
It should be noted that each calculate node 01 can pass through external external high speed input and output universal serial bus (English
Text:Peripheral Component Interconnec Expresst;Referred to as:PCIe) channel and corresponding buffer unit 02
Connection.In PCIe buses, the equipment based on PCIe buses is known as endpoint (English:EndPoint;Referred to as:EP), EP is namely
Buffer unit 02 in the embodiment of the present invention.
PCIe buses are a kind of general bus specifications, and PCIe buses are using connection mode end to end, in a PCIe
The both ends of link can only respectively connect an equipment, the two equipment data sending terminal and data receiver each other.
Further, buffer unit 02 can include cache controller, and cache controller is used for buffer consistency data
It is pre-processed.Optionally, cache controller is by Advanced Reduced Instruction Set machine (English:Acorn RISC Machine;Letter
Claim:ARM it) is made, data interaction can be carried out between multiple ARM, each cache controller includes multiple memory bars.Such as Fig. 1 institutes
Show, memory bar can be dual inline memory module (English:Dual-Inline-Memory-Modules;Referred to as:DIMM).
The pretreatment that buffer unit 02 carries out buffer consistency data can include:Correction process unpacks processing and envelope
At least one of packet processing.It should be noted that the detailed process of correction process, unpacking processing and package processing can refer to
The prior art, the embodiment of the present invention repeat no more this.
Specifically, arrow direction instruction calculate node 01 and caching in Fig. 1 between calculate node 01 and buffer unit 02
Subordinate relation between unit 02, i.e. calculate node 01 are master unit, and buffer unit 02 is the slave unit of calculate node 01.
Further, the 40GE/IB FDR in Fig. 1 are input/output interface 04, are used for transmission inputoutput data,
40GE/IB FDR refer to the input and output general-purpose interface of bandwidth, 40GE refer to 40Gb/s (gigabit/second) band width in physical with
Too net, IB FDR refer to the infinite bandwidth interface of 56Gb/s band width in physical.LL switching matrixs unit 03 can pass through super net
Card (English:Super Network Interface Card;Referred to as:SNIC) come from the defeated of input/output interface 04 to receive
Enter output data.Meanwhile LL switching matrixs unit 03 can also be by inputoutput data, that is, non-caching consistency (English:Non
Cache Consistency;Referred to as:NCC) data are converted to buffer consistency (English:Cache Consistency;Referred to as:
CC) data so that CC data and NCC data can be transmitted to buffer unit 02 by LL buses (being identified as LL in figure).Fig. 1
In, comprising two kinds of arrows in LL switching matrixs unit 03, wherein, arrow 1 represents the flow direction of NCC data, and arrow 2 represents CC data
Flow direction.
It should be noted that first, all-in-one machine provided in an embodiment of the present invention can be by buffer consistency data and input
It is transmitted together in the buffer consistency system that output data is formed in buffer unit and LL switching matrixs unit, and LL switching matrixs
Unit has low delay characteristic in itself, and therefore, which takes full advantage of the low delay in original buffer consistency system
Characteristic;Secondly as the cache controller of buffer unit includes ARM, and direct memory access (English is integrated in ARM:
Direct Memory Access;Abbreviation DMA) controller, therefore, which takes full advantage of the dma controller in ARM
Data transfer mode under the control of hardware, is realized and exchanges data, Ke Yijin in batch automatically between high-speed peripheral and main memory
Amount reduces the input/output operations mode of processor intervention;Again, ARM also has the characteristics that low-power consumption, therefore, the all-in-one machine
Independent buffer unit has low power consumption characteristic;Finally, in order to make system in the case where surprisingly losing power supply, ensure caching number
According to integrality and after system power supply recovery, field data can be restored in time, independent buffer unit can be given to add
Power down protection unit, therefore, the reliability of the transmission data of the all-in-one machine are higher.
In conclusion all-in-one machine provided in an embodiment of the present invention, due to by multiple calculate nodes and multiple buffer unit phases
Connection, and pass through LL switching matrixs unit and be connected respectively with multiple buffer units, input/output interface so that input and output number
According to that can be buffer consistency data by LL switching matrixs cell translation, buffer consistency data be carried out using buffer unit
Calculate node is transmitted to after pretreatment, therefore all-in-one machine can pass through plug-in buffer unit in the framework of buffer consistency
The inputoutput data of LL switching matrix cell processings is received, so as to reduce the storage of the computation burden of calculate node, memory
The delay of burden and data transmission.
The embodiment of the present invention provides another all-in-one machine, as shown in Fig. 2, the all-in-one machine includes:Multiple calculate nodes (Ser)
01, multiple buffer units 02, LL switching matrixs unit 03, input/output interface 04 and PCIe crosspoints 05.
PCIe crosspoints (English:PCIe switch;Referred to as:PCIe SW) 05 multiple 01 Hes of calculate node are connected respectively
Multiple buffer units 02, for carrying out the conversion for the PCIe data that buffer consistency data are supported with multiple calculate nodes 01.It needs
It is noted that PCIe crosspoints can pass through nontransparent (English:Non Transparent;Referred to as:NT) bridge is defeated to realize
Enter the pond of output caching, so that multiple calculate nodes 01 share the caching in multiple buffer units 02, make calculate node
No longer it is one-to-one relationship between 01 and buffer unit 02.Further, in multi-host system, each host can be with
By NT bridges come in access system, and by the address translation ability of NT bridges, all hosts can access all EP,
That is by NT bridges, each calculate node 01 in the embodiment of the present invention can access all buffer units 02, also
It is that input into/output from cache has been subjected to pond.Specifically, the concept in pond can be illustrated by the example of server pools, one
A server pools are construed as a superserver, and the resource of the server is distributed on the more clothes in all structure ponds
It is engaged among device, by the unified management and operation to server pools, particularly to the equilibrium, coordination and tune of multiple servers resource
Degree, can play and utilize existing computing resource to greatest extent.Therefore, multiple redundant server is integrated into one has
High reliability, high extended attribute superserver process, it is possible to be known as server " pond ".
The detailed process that NT bridges technology is realized can refer to the prior art, and details are not described herein.As shown in Fig. 2, PCIe is handed over
It changes comprising two kinds of arrows in unit 05, wherein, arrow 11 represents the flow direction of NCC data, and arrow 22 represents the flow direction of CC data.
LL switching matrixs unit 03 is connect respectively with multiple buffer units 02, input/output interface 04.Specifically, in Fig. 2
40GE/IB FDR for input/output interface 04, while LL switching matrixs unit 03 can by SNIC come receive come from it is defeated
Enter the inputoutput data of output interface 04.
Input/output interface 04 is used for transmission inputoutput data, and buffer unit 02 is used for memory buffers consistent data,
And calculate node 01, LL switching matrix lists are transmitted to by PCIe crosspoints 05 after buffer consistency data are pre-processed
Member 03 is used to carry out inputoutput data and the conversion of buffer consistency data.
Each calculate node 01 can pass through external PCIe channels, PCIe crosspoints 05 and corresponding buffer unit 02
Connection.
Further, buffer unit 02 can include cache controller, and cache controller is used for buffer consistency data
It is pre-processed.Cache controller can include:Field programmable gate array (English:Field-Programmable Gate
Array;Referred to as:FPGA) unit, Reduced Instruction Set Computer (English:Reduced instruction set computer;
Referred to as:Risc) and application-specific integrated circuit is (English:Application specific integrated circuit;Letter
Claim:At least one of Asic).
Further, each cache controller can also include multiple memory bars.Exemplary, memory bar can be DIMM.
The pretreatment that buffer unit 02 carries out buffer consistency data can include:Correction process unpacks processing and envelope
At least one of packet processing.
It should be noted that other explanations about Fig. 2 can be no longer superfluous herein with the description of reference implementation example a pair of Fig. 1
It states.
It should be noted that first, all-in-one machine provided in an embodiment of the present invention belongs to Distributed Calculation and caching, it is with one
The mode of small service is organized to build an application, service independent operating is in different processes, and service can be disposed independently,
Therefore the all-in-one machine preferably applies structure in incognito;Secondly, using the external equipment in PCIe bus couple processor systems,
Inputoutput data is transmitted by buffer consistency system simultaneously, therefore, the all-in-one machine is consistent with caching by PCIe architectural frameworks
Sexual system is combined, and realizes the data transmission between heterogeneous network;Again, the ARM in buffer unit is first invented into calculating
The visible input into/output from cache of node increases a PCIe crosspoint so that multiple ARM can further through NT bridges to PCIE buses
To invent the input into/output from cache in the visible continuation address space of calculate node, therefore, which realizes two-stage
Virtualization;Finally, input into/output from cache pond has been subjected to, therefore, the calculate node of the all-in-one machine can arbitrarily obtain multiple
Input into/output from cache in ARM, calculate node and buffer unit are no longer one-to-one relationships.
In conclusion all-in-one machine provided in an embodiment of the present invention, since multiple calculate nodes and multiple buffer units being led to
It crosses PCIe crosspoints to be connected, and passes through LL switching matrixs unit and be connected respectively with multiple buffer units, input/output interface
Connect so that inputoutput data can by LL switching matrixs cell translation be buffer consistency data, buffer consistency data
Calculate node is transmitted to by PCIe crosspoints after being pre-processed using buffer unit, thus all-in-one machine can share it is defeated
Enter output caching, in the framework of buffer consistency, input into/output from cache is obtained by buffer unit and PCIe crosspoints, from
And the computation burden of calculate node, the storage burden of memory and the delay of data transmission are reduced, realize load balancing, and dynamic
Manage resource.
The embodiment of the present invention provides another all-in-one machine, as shown in figure 3, the all-in-one machine includes:Multiple calculate nodes 01 are more
A buffer unit 02, LL switching matrixs unit 03, input/output interface 04 and PCIe crosspoints 05.
As shown in Figure 2 or Figure 3, PCIe crosspoints 05 connect multiple calculate nodes 01 and multiple buffer units 02 respectively,
For carrying out the conversion for the PCIe data that buffer consistency data are supported with multiple calculate nodes 01.
LL switching matrixs unit 03 is connect respectively with multiple buffer units 02, input/output interface 04.LL switching matrix lists
Mem.Da instruction internal storage datas in member 03, i.e. CC data, IO Da instruction inputoutput datas, i.e. NCC data.
Input/output interface 04 is used for transmission inputoutput data, and buffer unit 02 is used for memory buffers consistent data,
And calculate node 01, LL switching matrix units are transmitted to by PCIe crosspoints after buffer consistency data are pre-processed
03, for carrying out the conversion of inputoutput data and buffer consistency data.
Each calculate node 01 can pass through external PCIe channels, PCIe crosspoints 05 and corresponding buffer unit 02
Connection.
Further, buffer unit 02 can include cache controller, and cache controller is used for buffer consistency data
It is pre-processed.Cache controller can include:At least one of FPGA unit, Risc and Asic.
Each cache controller can also include multiple memory bars.Exemplary, memory bar can be that dual inline type stores
Module DIMM.
Optionally, each cache controller can also include storage unit 06, and each storage unit 06 can be by serial
Interface (English:Serial Advanced Technology Attachment;Referred to as:SATA it) connect, shows with buffer unit 02
Example, storage unit 06 can be hard disk drive (English:Hard Disk Drive;Referred to as:) or solid state disk (English HDD
Text:Solid State Drives;Referred to as:SSD).The embodiment of the present invention can also use SBY (standby electricity) to storage unit 06
Carry out power down protection.Specifically, SATA transmits data in a manner of sequential serial, primary only to transmit 1 data, SATA buses make
With embedded clock signal, there is preferable error correcting capability, transmission data and transmission instruction can be checked, if it find that
Mistake can automatically correct, therefore can improve the reliability of data transmission.HDD is most basic computer memory, exemplary, such as
Common computer hard disc (C disks, D disks) belongs to HDD in computer.SSD is with solid-state electronic storage chip array and manufactured hard
Disk is made of control unit and storage unit, SSD in the specification and definition, function and application method of interface with common hard disc
It is identical and also consistent with common hard disc in product design and size.
As shown in figure 3, the Stor.Da in PCIe crosspoints 05 represents storage data, buffer unit 02 is exchanged with PCIe
Arrow 3 between unit 05 represents the flow direction of internal storage data, and arrow 4 represents the flow direction of storage data.
The pretreatment that buffer unit 02 carries out buffer consistency data can include:Correction process unpacks processing and envelope
At least one of packet processing.
It should be noted that first, the storage unit point that all-in-one machine provided in an embodiment of the present invention passes through multiple low capacities
Cloth is disposed, and realizes the distributed storage of data, and therefore, which has higher stability and reliability;Secondly, lead to
It crosses and adds storage unit to buffer unit so that internal storage data and storage data are shunted, therefore, the plug-in storage of the all-in-one machine
It is delayed relatively low.
In conclusion all-in-one machine provided in an embodiment of the present invention, due to being added to storage unit to buffer unit, caching is single
Member can carry out data distribution according to data type to buffer consistency data so that calculate node can obtain memory number respectively
According to storage data, so as to reduce the delay of the plug-in storage of all-in-one machine, improve the handling capacity of all-in-one machine.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.