A kind of machine learning and artificial intelligence application all-in-one dispositions method
Technical field
The present invention relates to machine learning and field of artificial intelligence, more particularly to a kind of machine learning and artificial intelligence should
With all-in-one dispositions method.
Background technology
Artificial intelligence is just suggested early in the last century 50's, and it is cybernetics, information theory, computer science, mathematics
A variety of subjects such as logic, neuro-physiology, psychology, linguistics, pedagogy, medical science, engineering technology and philosophy interpenetrate
Cross discipline.People dream of with that time just incipient computer come construct complexity, possess and the same sample of the wisdom of humanity
The machine of matter characteristic.This omnipotent machine, it has our all perception (or even more more than people), and we are all
Rationality, it can think deeply as us.Machine learning is to study the learning behavior that the mankind were simulated or realized to computer how, to obtain
New knowledge or skills are taken, the existing structure of knowledge is reorganized and is allowed to constantly improve the performance of itself.It is artificial intelligence
Core, it is the fundamental way for making computer that there is intelligence, every field of its application throughout artificial intelligence.Machine learning is most basic
Way, be to parse data using algorithm, from learning, then the event in real world made a policy and predicted.With
Traditional different to solve particular task, the software program of hard coded, machine learning is to be passed through with substantial amounts of data come " training "
How various algorithms complete task from data learning.
Machine learning is a very popular field in Artificial Intelligence Development, the research purpose of machine learning, is desirable to
Computer has the ability for obtaining knowledge from real world as the mankind, while establishes the computational theory of study, and construction is various
Learning system is simultaneously applied it in every field.Machine learning research mainly has three directions, first, to simulate the mankind's
Learning process is set out, it is intended to establishes the understanding physiological mode of study, the development of this direction and cognitive science is closely related;Two
It is basic research, develops the theories of learning of various suitable machine features, inquire into all possible learning method, comparative anthropology practises
The similarities and differences with machine learning are with contacting;Third, application study, establishes various practical learning systems or knowledge acquisition aid,
Automatic acquisition knowledge system is established in the application field of Artificial Intelligence Science, is accumulated experience, is improved knowledge base and control knowledge, enter
And the level of intelligence of machine is set to be similar to the mankind.
At present, the scientific and technological giant including Baidu and Google, 2016 in artificial intelligence be dropped in 20,000,000,000 to
Between 30000000000 dollars, wherein in 90% input research and development and deployment, also 10%, which is used for artificial intelligence, purchases.Current artificial intelligence
Exogenous investment of the speed 3 times since 2013 can have been invested to increase.Artificial Intelligence Development field is concentrated mainly on high-tech/electricity
Letter, automobile/assembling and financial services.Machine learning can be used to assist in industrial quarters and solve problem conscientiously, particularly instantly
Focus, such as deep learning, influence unmanned, that artificial intelligence assistant is to industrial quarters be huge.
Big data has promoted the development of artificial intelligence, meanwhile, the development of artificial intelligence also allows data to produce huge value,
As " intelligent data ".Artificial intelligence is now had been supplied in various big data applications, such as:Search is recommended, shopping is recommended, voice is known
Not, image recognition, chat robots, intelligent medical etc..Machine learning and artificial intelligence are continuous on the basis of big data
Grow up, in order to allow rambling mass data to produce value, it is necessary to be carried out using complicated network model to data
Analyze in large quantities, the model of high-accuracy at ability training, this just needs huge amount of calculation, therefore computing capability is to engineering
Practise and the development of artificial intelligence becomes more and more important.
Current big data machine learning algorithm and artificial intelligence analysis are a relatively inefficient use, and resources occupation rate is higher.
It is slower to the processing speed of mass data, and requirement of the mass data to hardware in processing procedure is high, can not meet number
According to the intelligence computation requirement of driving enterprise's rapid growth.
The content of the invention
To solve the deficiencies in the prior art, it is an object of the invention to provide a kind of machine learning and artificial intelligence application one
Body machine dispositions method, the process employs special design and a variety of optimisation techniques so that all-in-one have superelevation calculate performance,
The speed of service of program can be significantly speeded up, the machine learning and artificial intelligence application being suitably applied under big data environment.
In order to realize above-mentioned target, the present invention adopts the following technical scheme that:A kind of machine learning and artificial intelligence application
All-in-one dispositions method, it is characterised in that comprise the following steps:
Step 1, data storage and data processing are isolated, using the Shared-Nothing framves of enhanced scalability
Structure builds overall system architecture, and the system architecture is logically separated into application layer, computation layer and accumulation layer, and application layer,
Computation layer and accumulation layer all use distributed structure/architecture;
Step 2, building network framework, network architecture are divided into single chassis networking topology or multi-frame networking topology, the net
Network framework is logically divided into extranets, management net, calculates net and storage net;
Step 3, design is optimized to the autgmentability of system.
Further, the application layer is according to the application node for being actually needed configuration varying number;The computation layer according to
It is actually needed the calculate node of configuration varying number;The accumulation layer is according to the memory node for being actually needed configuration varying number.
Further, the calculate node configures following software stack:
Support a variety of programming languages;
API for machine learning and deep learning is provided;
It is integrated with deep learning framework TensorFlow;
It is integrated with the distributed computing framework Spark optimized;
The distributed memory file system Alluxio that optimized is integrated with to accelerate reading and writing data;
It is integrated with the RDMA characteristics optimized.
Further, the memory node provides two kinds of storage services of database and universal file system;The data
Storehouse includes relevant database PostgreSQL and sequential type database, and the relevant database PostgreSQL uses HAWQ
Distributed architecture, the sequential type database use OpenTSDB+Hase distributed architectures;The universal file system uses
HDFS+Ceph mixed structures, HAQW bottoms use HDFS.
Further, extranets network interface card and management net network interface card are disposed on the application node, is disposed in the calculate node
Manage net network interface card and calculate storage net network interface card, management net network interface card is disposed on the memory node and calculates storage net network interface card.
Further, the single chassis networking topology includes a frame, and construction method is:
An Ethernet switch is equipped with, the port number of the Ethernet switch is more than or equal to total node in frame
Number;
It is equipped with one and calculates storage network switch, the port number for calculating storage network switch is more than or equal in frame
Total node number;
It is equipped with an outside network switch.
Further, the multi-frame networking topology includes multiple frames, and construction method is:
Each frame is equipped with an Ethernet switch, and the port number of the Ethernet switch is more than total section in frame
Points, and reserve port to connect other frames;
Each frame is equipped with one and calculates storage network switch, and the port number for calculating storage network switch is more than frame
Interior total node number, and reserve port to connect other frames;
It is equipped with appropriate number of outside network switch;
Core switch is equipped with, the management network switch of each frame is connected to the core switch using simply tree-like
On.
Further, the storage network switch that calculates is InfiniBand interchangers, the InfiniBand of each frame
The multiple core switch of interchanger connection form fat tree construction.
Further, the autgmentability to system optimizes comprising the following steps that for design:
Performance is improved using framework extending transversely;
Memory capacity is increased using layer architecture.
Further, described the step of using framework extending transversely to improve performance for:
Increase the calculate node number in the computation layer;
Increase the appropriate network switch.
The present invention is advantageous in that:
(1) all-in-one is isolated data storage and data processing, using the Shared-Nothing of enhanced scalability
Framework, client, data processing, data storage are separated, are logically divided into three levels:Application layer, computation layer, storage
Layer.Each level uses distributed structure/architecture, can reach higher calculating concurrency and reading and writing data concurrency, while allow whole
Individual system is with good expansibility, reliability and maintainability.
(2) distributed super fusion hardware structure and the ingenious collocation with software stack, avoid storage and the waste of computing resource,
The stability of data analysis streamline is ensured, lifts analysis efficiency.For the every aspect of hardware structure, including CPU, internal memory,
Hierarchical storage, GPU have carried out special optimization, have fully excavated the ability of hardware.Go back Deep integrating simultaneously
The frameworks such as TensorFlow, a large amount of optimizations are carried out to distributed machines learning algorithm and communication mechanism.
(3) making full use of by framework, algorithm improvement and hardware, realize that the calculating of the order of magnitude accelerates, reduce enterprise
To big data infrastructure and the input of manpower.By data cleansing, modeling analysis, high quality, significant information are obtained, from
And excavate data value.
Brief description of the drawings
Fig. 1 is flow chart of the present invention;
Fig. 2 is the integral frame schematic diagram of system;
Fig. 3 is the deployment of components framework schematic diagram of system.
Embodiment
Make specific introduce to the present invention below in conjunction with the drawings and specific embodiments.
Shown in reference picture 1, a kind of machine learning of the present invention and artificial intelligence application all-in-one dispositions method, including following step
Suddenly:
Step 1, all-in-one are isolated data storage and data processing, using the Shared- of enhanced scalability
Nothing frameworks, three layers can be divided on the whole:Application layer, computation layer, accumulation layer.All-in-one uses layer architecture, has possessed
The hardware protection of whole redundancy, any one calculate node or memory node break down, and can ensure that data will not lose,
And all-in-one still is able to normal work, the reliability of system is drastically increased.
Client, data processing, data storage are separated, are logically divided into three levels, each level uses
Distributed structure/architecture, higher calculating concurrency and reading and writing data concurrency can be reached, while allow whole system to have well
Scalability, reliability and maintainability.
Wherein, application layer mainly runs user interface service, as processing login, monitoring, management, calculating task layout/carry
The work such as friendship.It is required that CPU and memory configurations are medium, storage capacity configuration is low;The calculating that computation layer is used for performing user's submission is appointed
Business.It is required that CPU and memory configurations are high, storage capacity configuration is low;Accumulation layer is mainly that calculate node provides massive store.It is required that
CPU and memory configurations are low, and storage capacity configuration is high.As shown in Fig. 2 all-in-one can also be actually needed according to different, neatly
Application node, calculate node and the memory node of varying number, and the scalability with height are configured, is integrated with WebUI, money
The functions such as source control, system monitoring, scheduling of resource, task management.
Shown in reference picture 3.Application node allows user easily to carry out task management, system by providing a Web UI
The way to manages such as monitoring, resource management, outermost layer of the application node as all-in-one, it is exposed to user's operation.Application node, tool
Body will provide following functional interface:Application management (using submission/using deletion/application state inquiry), data storage and inquiry
(copy/paste/upload/download/establishment/movement/is deleted for (structured storage interface/unstructured memory interface), file management
Except), monitoring resource (GPU/CPU/Memory/Network/Disk/Others), management (resource management/Role Management/user
Manage/assure reason/node administration).
Calculate node optimizes for a large amount of consumings of computing resource, using the software stack specially designed:
A. the support of a variety of programming languages, such as Python, R, Java, Scala are provided;
B. the API for machine learning and deep learning is provided, while also supports some other general-purpose computations
API;
C. it is integrated with deep learning framework TensorFlow.Tensorflow is as deep learning framework, substantial amounts of application
Based on being developed on this framework, all-in-one is integrated with the framework so that the application based on this Development of Framework can be directly at it
Upper operation.
D. it is integrated with the distributed computing framework Spark optimized.Spark is an efficient distributed computing system,
On this basis, Spark underlying algorithm storehouse is optimized so that distributed task scheduling has operation faster in each calculate node
Speed.
E. the distributed memory file system Alluxio that optimized is integrated with to accelerate reading and writing data.Alluxio is one
Distributed memory file system, it is allowed to file is reliably shared with the speed of internal memory in cluster frameworks, on this basis,
Further it is optimized so that Scheduling Framework can preferably utilize Alluxio distributed memory characteristic.
F. the RDMA characteristics optimized are integrated with:JXIO.RDMA (Remote Direct Memory Access) technology can
To solve the delay issue that servers' data is handled in network transmission.RDMA is by network the directly incoming computer of data
Memory block, data are moved quickly into remote system stored device from a system, without being had any impact to operating system,
So only need to use seldom cpu resource.It eliminates external memory storage duplication and text exchange operation, in releasing
Deposit bandwidth and cpu cycle is used to provide application program capacity.
The characteristics of according further to all-in-one Distributed Architecture, the calculating platform of optimization can be deployed to each calculate node
In, because upper strata is using mesos progress task schedulings and resource management, therefore the role of each calculate node is identical
, do not differentiate between the concepts of master and worker nodes.
The memory node of all-in-one is responsible for providing store function, and two kinds of main offer database and universal file system are deposited
Storage service.Database can be divided into two classes, respectively relevant database PostgreSQL and sequential type database.Distribution collection
PostgreSQL HAWQ and the OpenTSDB+Hase of time series database is respectively adopted in group's scheme.Shown in reference picture 3, file system
System equally uses distributed frame, and using HDFS+Ceph mixed structures, HAQW bottoms use HDFS, and other components use
Ceph is stored.
Therefore, upper layer data management tool can automatically select data and be stored on HDFS according to the storage mode of file
Or on Ceph.Database software layer is deployed in computing cluster, and file system software is deployed in storage cluster, each company-data
Management function is positioned such that:
1) application cluster:
The unified access interface of data is provided;
The import/export interfaces of large-scale data are provided;
Dispose the management software client of database;
Dispose the monitoring tools of database positioning.
2) computing cluster:
Dispose database management language;
SQL/REST api interfaces are provided.
3) storage cluster:
Using HDFS, Ceph distributed file system of mixing;
Support block storage, object storage.
Step 2, building network framework, it is divided into two kinds of single chassis networking topological sum multi-frame networking topology, logically draws
It is divided into extranets, management net, calculates net and storage net.
Extranets:For connecting the interchanger of user, the network for accessing all-in-one service is externally provided.External network connects
Mouth uses common 1Gbps Ethernets, and extranets network interface card is only disposed on application node.
Manage net:Calculating task etc. is submitted for monitoring, managing each node of all-in-one, and to calculate node.This
A little tasks are not very high to network bandwidth and delay requirement, while to avoid influenceing calculating net and storage net, using independently of meter
The common 1Gbps Ethernets for calculating net and storage net can be, it is necessary to which all deployment manages net network interface card on each node.
Calculate net:For connecting each calculate node, very high to network delay requirement (height uses InfiniBand nets with version
Card).
Store net:It is very high to network bandwidth and delay requirement for connecting each memory node, here using high bandwidth and
The 56Gbps InfiniBand (standard edition can use 10Gbps RoCE network interface cards) of low delay by storage net and calculate net fusion
For a network.Except all disposing InfiniBand network interface cards in calculate node and memory node, it is contemplated that application node may also
The data of access memory node are needed, so application node can contemplate also deployment InfiniBand network interface cards.
All-in-one mounting means in units of frame, in each frame can include several application nodes, calculate node and
Memory node.Each frame is equipped with an Ethernet switch (management net), an InfiniBand interchanger (calculates storage
Net), the port number of every interchanger should be not less than the total node number in the frame.If necessary to extend multiple frames, exchange
Machine will also reserve certain port number to be connected to other frames.As for outside network switch, it is contemplated that application node is relatively
It is few, it may be considered that multiple frames share an interchanger.Networking for multiple frames is, it is necessary to increase extra core switch
To connect each frame, the management network switch of each frame, a core switch can be converged to using simply tree-like.
And using InfiniBand calculating to store network switch needs to use multiple core switch to form fat tree construction to ensure
There is full bandwidth pathway between any two node.
Step 3, all-in-one design deployment makes it have the good system expandability, be broadly divided into behavior extension and
Expanding storage depth.
All-in-one uses framework extending transversely, can be by increasing the calculate node in computation layer, so as to increase entirety
Computing resource (GPU/CPU/Memory), and then improve the application program speed of service., may when rolling up calculate node
Network is caused to turn into bottleneck, in order to keep computing resource (GPU/CPU/Memory), storage and network to be in a kind of balanced mode
Above, it is necessary to which increasing the appropriate network switch goes solve the problems, such as network bottleneck.All-in-one uses layer architecture, by data storage list
Member separates with calculation processing unit, therefore, in the case of a large amount of storages are needed, directly can laterally increase memory capacity, non-
It is often convenient.
The basic principles, principal features and advantages of the present invention have been shown and described above.The technical staff of the industry should
Understand, the invention is not limited in any way for above-described embodiment, all to be obtained by the way of equivalent substitution or equivalent transformation
Technical scheme, all fall within protection scope of the present invention.