CN107766147A - Distributed data analysis task scheduling system - Google Patents
Distributed data analysis task scheduling system Download PDFInfo
- Publication number
- CN107766147A CN107766147A CN201610712300.1A CN201610712300A CN107766147A CN 107766147 A CN107766147 A CN 107766147A CN 201610712300 A CN201610712300 A CN 201610712300A CN 107766147 A CN107766147 A CN 107766147A
- Authority
- CN
- China
- Prior art keywords
- distributed
- task
- data
- data analysis
- task scheduling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5017—Task decomposition
Abstract
The invention provides a kind of distributed data analysis task scheduling system, including:Distributed Storage service module, resource-based distributed task dispatching engine modules, Distributed Message Queue module, distributed application program coordination service module, automatic enforcement engine module.The present invention proposes a kind of distributed task dispatching framework that data analysis is carried out using R language, and distributed resource management is realized using resource management platform, and then realizes that the distributed scheduling of R language analyses performs;Using Automatic dispatching engine, realize that the task of data fragmentation is called automatically, meet the automated analysis demand of industrial process data tracking.
Description
Technical field
The present invention relates to data analysis task scheduling, in particular it relates to distributed data analysis task scheduling system, energy
The scheduling for being enough widely used in the data analysis formula of industrial process data performs.
Background technology
With the object tracking such as development, mobile device, RFID of the technologies such as the lasting depth propulsion of industry 4.0 and Internet of Things
The application of equipment in the industrial production is more and more extensive, and explosive increase of data will turn into trend.At the same time, enterprise's essence
The propulsion of the management of refinement needs the analysis of more data volumes and wider data dimension to provide support for business decision, and
With the use of Enterprise Informatization system for many years, many enterprises all generate substantial amounts of historical data, but analyze profit to it
With and it is insufficient.The how fully existing and newly-increased a large number of services data of digging utilizationThe traditional commercial Application of comparative study
BI instruments, be primarily present it is following some deficiency:
(1) substantial amounts of data analysis task can not carry out distributed scheduling automatically, frequently result in the accumulation of data analysis task
In individual node, node resource is caused to consume more, execution efficiency is low.
(2) there is single node failure in data analysis task, after the analysis task failure of single node is run on, no weight
Open execution mechanism.
(3) defect effectively timed task scheduling feature, fixed cycle or the analysis task that fixed interval performs can not be met
Dispatching requirement.
(4) data storage of data source and analysis result is generally without distributed frame is utilized, as a result easily by result
The influence of the unit failure of memory node, it is possible to cause loss of data.
R language is the language of increasing income exclusively for statistics and data analysis exploitation, good to different operating system compatibility, is programmed
Succinctly, it is programming tool platform that statistical analysis personnel prefer.Ripe data mining algorithm bag is abundant and is constantly increasing,
Also there is powerful analysis result visualization model, as ggplot multi-layer image is drawn.But its shortcoming is:Big text-processing is compared
Difference, although data analysis component is very strong, lack for data management part, so being frequently necessary to after external environment condition is carried out
After data segmentation, return again to R language platforms and carry out analysis application.
In disclosed paper studies, Yang Xia, Wu Dongwei's《Application of the R language in big data processing》, mainly
The characteristics of introducing the RHadoop expanding packets of Revolution Analytics companies and occupation mode, can be in R using the bag
Map-Reduce programs are write, Liu Wenfei's《Integrated technology and its realization based on R language and Hadoop》, mainly describe
Integrated using Hadoop Streaming mode and perform R programs.Application No. CN201610074884.4, title:It is distributed
The task of Computational frame sends system, this publication disclose a kind of task of distributed computing framework and sends system, wherein
Including application server, task queue service platform and Redi s service platforms.Application server is used to dispose at multiple business
Reason service;Task queue service platform passes through networking cluster by multiple tasks server, is disposed in task queue service platform
Zookeeper is serviced, and task-scheduling operation is used for the new client task for handling the message queue of zookeeper services;By more
Individual Redi s servers connect and compose through networking, and Redis service platforms are connected to task queue service platform via networking,
Redi s service platforms call treatment progress according to the new client task added in message queue, and treatment progress enters to client task
Row cleans and exports the first business result and stores into Redi s buffer memories;The real-time computing module of Redis service platforms
Detect and the first new business result in Redis buffer memories be present, real-time computing module is calculated the first business result
And export the second business result.
Technical essential compares:For the present invention compared with the patent document, technical pattern difference is obvious, is equally being based on
Zookeeper service purpose is different, and the invention is used for message queue client task, is then used to perform timed task in the present invention
Single node it is fault-tolerant.For the present invention without Redis is used, the patent document does not have timer-triggered scheduler strategy yet.
The content of the invention
For in the prior art the defects of, it is an object of the invention to provide a kind of distributed data analysis task scheduling system
System.The advantages of the technical problem to be solved in the present invention is how to make full use of the data analysis of R platforms strong, flexible, utilize big number
According to distributed resource management service under environment, a resource-based distributed data analysis task scheduling system, side are built
Just the data analyst during Industrial Analysis uses the focus for being the present invention.
According to a kind of distributed data analysis task scheduling system provided by the invention, including:
Distributed Storage service module:Stored by non-relational database, pass through distributed search engine
Carry out the retrieval of data, there is provided Distributed Storage service
Resource-based distributed task dispatching engine modules:Carry out resource management, resources control, task scheduling with
Track, there is provided task scheduling service;
Distributed Message Queue module:Realize the issue of data with subscribing to function by Distributed Message Queue;
Distributed application program coordination service module:The subsequent execution of automatic enforcement engine task in single node is carried out
It is fault-tolerant;
Automatic enforcement engine module:Data analysis task is analyzed.
Preferably, non-relational database uses database HBase;Distributed search engine is using search application server
Solr。
Preferably, Distributed Storage service module is the data source of analysis task.
Preferably, Distributed Storage service module is the memory carrier of assignment file and data results.
The distributed task dispatching engine modules of resource are preferably based on, are that resource is carried out based on explorer YARN
Management, resources control, task scheduling and tracking.
Preferably, automatic enforcement engine module, according to cycle access, according to clocked flip two ways, to data analysis
Task does continual analysis, the Automatic dispatching engine on node is carried out using distributed application program coordination service fault-tolerant.
Preferably, in system initialisation phase, the application program of each node is from Distributed Storage service module
The data analysis task of the run mode based on timer-triggered scheduler engine is loaded, task is created in Distributed Message Queue and changes topic
TOPIC, loading distributed application program coordination service ZooKeeper;
SDK interface interchange Distributed Storage service modules are provided, acquisition is converted into kernel data structure in R language
The data source of Data.Frame structures, data results are saved in distributed storage carrier;
The task of run mode, the variation of Automatic dispatching engine strategy are broadcasted by Distributed Message Queue in cluster, each section
Point subscription task changes topic TOPIC respectively in respective node updates Automatic dispatching engine strategy.
Preferably, the data task of load operating state is initialized, for appointing after continuing in last time cluster entirety disaster
Business performs.
Preferably, the internal memory of explorer control individual task, CPU usage amount are no more than the application value of the task,
And provide execution journal and status inquiry.
Preferably, analyze script file to obtain from distributed storage service, broadcasted in the cluster by HDFS.
According to a kind of distributed data analysis method for scheduling task provided by the invention, including:
Distributed Storage service steps:Stored by non-relational database, pass through distributed search engine
Carry out the retrieval of data, there is provided Distributed Storage service
Resource-based distributed task dispatching engine step:Carry out resource management, resources control, task scheduling with
Track, there is provided task scheduling service;
Distributed Message Queue step:Realize the issue of data with subscribing to function by Distributed Message Queue;
Distributed application program coordination service step:The subsequent execution of automatic enforcement engine task in single node is carried out
It is fault-tolerant;
Automatic enforcement engine step:Data analysis task is analyzed.
Preferably, non-relational database uses database HBase;Distributed search engine is using search application server
Solr。
Preferably, Distributed Storage service steps are the data sources of analysis task.
Preferably, Distributed Storage service steps are the memory carriers of assignment file and data results.
The distributed task dispatching engine step of resource is preferably based on, is that resource is carried out based on explorer YARN
Management, resources control, task scheduling and tracking.
Preferably, automatic enforcement engine step, according to cycle access, according to clocked flip two ways, to data analysis
Task does continual analysis, the Automatic dispatching engine on node is carried out using distributed application program coordination service fault-tolerant.
Preferably, loaded in method initial phase, the application program of each node from Distributed Storage service
The data analysis task of run mode based on timer-triggered scheduler engine, task variation topic is created in Distributed Message Queue
TOPIC, loading distributed application program coordination service ZooKeeper;
SDK interface interchange Distributed Storage services are provided, acquisition is converted into kernel data structure in R language
The data source of Data.Frame structures, data results are saved in distributed storage carrier;
The task of run mode, the variation of Automatic dispatching engine strategy are broadcasted by Distributed Message Queue in cluster, each section
Point subscription task changes topic TOPIC respectively in respective node updates Automatic dispatching engine strategy.
Preferably, the data task of load operating state is initialized, for appointing after continuing in last time cluster entirety disaster
Business performs.
Preferably, the internal memory of explorer control individual task, CPU usage amount are no more than the application value of the task,
And provide execution journal and status inquiry.
Preferably, analyze script file to obtain from distributed storage service, broadcasted in the cluster by HDFS.
Compared with prior art, the present invention has following beneficial effect:
The present invention proposes a kind of distributed task dispatching framework that data analysis is carried out using R language, utilizes resource pipe
Platform realizes distributed resource management, and then realizes that the distributed scheduling of R language analyses performs;Using Automatic dispatching engine,
Realize that the task of data fragmentation is called automatically, meet the automated analysis demand of industrial process data tracking.
The present invention possesses fault-tolerance using distributed scheduling architecture, autgmentability, can be that industrial process data analysis carries
Ensure for reliable, the ever-increasing resource requirement of data analysis task can be also met by way of horizontal extension, finally
Process data, the value of historical data are preferably played, support is provided for the decision-making of enterprise, is provided for the intellectuality of manufacturing process
Data basis, support is provided for the transition of enterprise.
One of main emphasis of the present invention cut based on data, towards different parameters, identical calculations process, persistently
The data analysis process frequently tracked, in the case where not increasing the development difficulty for the analysis program that user writes R, there is provided
Availability, fault-tolerance, the strong task scheduling service architecture for being capable of perform script task parallel of expansion.
Brief description of the drawings
The detailed description made by reading with reference to the following drawings to non-limiting example, further feature of the invention,
Objects and advantages will become more apparent upon:
Fig. 1 is the structural representation of system provided by the invention.
Embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area
Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill to this area
For personnel, without departing from the inventive concept of the premise, some changes and improvements can also be made.These belong to the present invention
Protection domain.
Fig. 1 that the framework of distributed data analysis task scheduling system is shown in accompanying drawing, mainly by following several module groups
Into:
Distributed Storage service module:Distributed storage service is stored by Nosql databases HBase,
The retrieval of data is realized by distributed search engine Solr, meets that Large Copacity, highly reliable, high-performance, data trnascription are safe, dynamic
The characteristics of state extended capability is strong.This module can as the data source of branch office's analysis task, can also be used as assignment file and
The memory carrier of data results.
Resource-based distributed task dispatching engine modules:Task scheduling engine is based on YARN and carries out resource management, money
Source control, task scheduling and tracking.
Distributed Message Queue module:Distributed Message Queue is disappeared using the Kafka message queues increased income by distribution
Breath queue realizes that the issue of data must is fulfilled for high-throughput, high reliability and persistence energy with subscribing to function, the message queue
Power, so as to realize the transmitting of data.
Distributed application program coordination service module:Based on open source technology ZooKeeper.It is mainly used in pair in the system
The subsequent execution of automatic enforcement engine task in single node carries out fault-tolerant.
Automatic enforcement engine module:Automatic enforcement engine is based on the Quartz frameworks increased income.For industrial process part number
The characteristics of according to needing persistently to track, it is broadly divided into:There is provided according to cycle access and according to clocked flip two ways, to data point
Analysis task does continual analysis.The Automatic dispatching engine on node is carried out using distributed application program coordination service fault-tolerant.
The flow of distributed data analyzing task scheduling is as follows:
(1) system initialization, the application program of each node loads from Distributed Storage service to be adjusted based on timing
The data analysis task of the run mode of engine is spent, the task that created in Distributed Message Queue changes TOPIC, loading
ZooKeeper is serviced.Initialize load operating state data task purpose be continue last time cluster entirety disaster (such as
Whole cluster power-off) on after tasks carrying.
(2) SDK interface interchange Distributed Storage services are provided, acquisition is converted into kernel data structure in R
The data source of Data.Frame structures, while data results (analysis picture, Study document etc.) can be saved in distribution
In formula memory carrier.
(3) task of run mode, the variation of Automatic dispatching engine strategy are broadcasted by Distributed Message Queue in cluster, respectively
Node subscription task changes TOPIC respectively in respective node updates Automatic dispatching engine strategy.
(4) explorer based on YARN will control individual task its internal memory, CPU usage amount no more than its application
Value, and execution journal and status inquiry are provided.Wherein analyze script file to obtain from distributed storage service, existed by HDFS
Broadcasted in cluster.
Distributed task dispatching flow characteristic is analyzed:
(1) high reliability:System is using distributed structure/architecture without Single Point of Faliure, distributed storage service, distributed message team
Increase income big data technology Hbase, kafka, Solr, ZooKeeper that row, distributed application program coordination service are based on have
Its own fault tolerant mechanism, and for the Automatic dispatching engine in node, system utilizes ZooKeeper Leader uniqueness, adopts
Just perform the mechanism of the execution of data analysis task with the node where only Leader, after Single Point of Faliure, ZooKeeper can be selected
New Leader is enumerated, therefore make use of ZooKeeper to realize the fault-tolerant of the tasks carrying based on Automatic dispatching engine strategy.
Single Point of Faliure problem is also not present also with distributed storage service in the storage of assignment file.
(2) horizontal extension is facilitated:Because system architecture uses Distributed Design, therefore only need to increase node can reality
The horizontal expansion of existing cluster, to adapt to more resources of data analysis mission requirements.
(3) isolation of task resource:Using the resource isolation of YARN resource management systems, can solve data analysis task
Between influencing each other on resource is fought for, prevented a certain task from monopolizing system resource, and it is long-term etc. to cause other tasks to be absorbed in
Treat to perform.
(4) data fragmentation performs with tasks in parallel:The HashKey of data storage definition can be utilized regular, R scripts
Incoming parameter, the HashKey that presses of Distributed Storage service are inquired about, and the burst of data are realized, by original large data sets
Task is cut into subtask, and Parallel Scheduling performs in distributed task dispatching system.The data analysis for taking into account R system is strong
, make up the deficiency in terms of its data management.
In preferred embodiment, using following configuration:
(1) four X86 servers (being named as A, B, C, D) are provided at, and memory configurations are not less than 64G, CPU recommends most
Low E2650.
(2) Distributed Message Queue service is disposed, kafka is deployed in tetra- machines of A, B, C, D simultaneously and completes cluster
Configuration.
(3) distributed application program coordination service is disposed, ZooKeeper is deployed in tetra- machines of A, B, C, D simultaneously simultaneously
Complete the configuration of cluster.
(4) distributed storage service is disposed, HBase master is deployed in node A, node B, C, D are disposed respectively
RegionServer, while the configuration of Hadoop environment is completed, Hadoop Namenode is deployed in node A, node B, C, D
DataNode is disposed respectively and completes the configuration of cluster.
(5) distributed task dispatching service is disposed, Yarn is deployed in tetra- machines of A, B, C, D simultaneously and completes cluster
Configuration.
One skilled in the art will appreciate that except realizing system provided by the invention in a manner of pure computer readable program code
And its beyond each device, module, unit, completely can be by the way that method and step progress programming in logic be provided come the present invention
System and its each device, module, unit with gate, switch, application specific integrated circuit, programmable logic controller (PLC) and embedding
Enter the form of the controller that declines etc. to realize identical function.So system provided by the invention and its every device, module, list
Member is considered a kind of hardware component, and is used to realize that device, module, the unit of various functions also may be used to what is included in it
To be considered as the structure in hardware component;It both can be real that will can also be considered as device, module, the unit of realizing various functions
The software module of existing method can be the structure in hardware component again.
The specific embodiment of the present invention is described above.It is to be appreciated that the invention is not limited in above-mentioned
Particular implementation, those skilled in the art can make a variety of changes or change within the scope of the claims, this not shadow
Ring the substantive content of the present invention.In the case where not conflicting, the feature in embodiments herein and embodiment can any phase
Mutually combination.
Claims (10)
- A kind of 1. distributed data analysis task scheduling system, it is characterised in that including:Distributed Storage service module:Stored by non-relational database, carried out by distributed search engine The retrieval of data, there is provided Distributed Storage serviceResource-based distributed task dispatching engine modules:Resource management, resources control, task scheduling and tracking are carried out, is carried For task scheduling service;Distributed Message Queue module:Realize the issue of data with subscribing to function by Distributed Message Queue;Distributed application program coordination service module:The subsequent execution of automatic enforcement engine task in single node is held It is wrong;Automatic enforcement engine module:Data analysis task is analyzed.
- 2. distributed data analysis task scheduling system according to claim 1, it is characterised in that non-relational data Storehouse uses database HBase;Distributed search engine is using search application server Solr.
- 3. distributed data analysis task scheduling system according to claim 1, it is characterised in that distributed data is deposited Storage service module is the data source of analysis task.
- 4. distributed data analysis task scheduling system according to claim 1, it is characterised in that distributed data is deposited Storage service module is the memory carrier of assignment file and data results.
- 5. distributed data analysis task scheduling system according to claim 1, it is characterised in that resource-based point Cloth task scheduling engine module, be based on explorer YARN carry out resource management, resources control, task scheduling with Track.
- 6. distributed data analysis task scheduling system according to claim 1, it is characterised in that automatic enforcement engine Module, according to cycle access, according to clocked flip two ways, continual analysis is done to data analysis task, should using distribution The Automatic dispatching engine on node is carried out with Program Coordination service fault-tolerant.
- 7. distributed data analysis task scheduling system according to claim 1, it is characterised in that in system initialization Stage, the application program of each node load the run mode based on timer-triggered scheduler engine from Distributed Storage service module Data analysis task, in Distributed Message Queue create task change topic TOPIC, loading distributed application program coordinate Service ZooKeeper;SDK interface interchange Distributed Storage service modules are provided, acquisition is converted into kernel data structure in R language The data source of Data.Frame structures, data results are saved in distributed storage carrier;The task of run mode, the variation of Automatic dispatching engine strategy are broadcasted by Distributed Message Queue in cluster, and each node is ordered Read task and change topic TOPIC respectively in respective node updates Automatic dispatching engine strategy.
- 8. distributed data analysis task scheduling system according to claim 7, it is characterised in that initialization loading fortune The data task of row state, for the tasks carrying after continuing in last time cluster entirety disaster.
- 9. distributed data analysis task scheduling system according to claim 7, it is characterised in that explorer control The internal memory of individual task processed, CPU usage amount are no more than the application value of the task, and provide execution journal and status inquiry.
- 10. distributed data analysis task scheduling system according to claim 9, it is characterised in that analysis script text Part obtains from distributed storage service, is broadcasted in the cluster by HDFS.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610712300.1A CN107766147A (en) | 2016-08-23 | 2016-08-23 | Distributed data analysis task scheduling system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610712300.1A CN107766147A (en) | 2016-08-23 | 2016-08-23 | Distributed data analysis task scheduling system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107766147A true CN107766147A (en) | 2018-03-06 |
Family
ID=61264393
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610712300.1A Pending CN107766147A (en) | 2016-08-23 | 2016-08-23 | Distributed data analysis task scheduling system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107766147A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985981A (en) * | 2018-06-28 | 2018-12-11 | 北京奇虎科技有限公司 | Data processing system and method |
CN109033196A (en) * | 2018-06-28 | 2018-12-18 | 北京奇虎科技有限公司 | A kind of distributed data scheduling system and method |
CN109194538A (en) * | 2018-08-03 | 2019-01-11 | 平安科技(深圳)有限公司 | Test method, device, server and storage medium based on distributed coordination |
CN109669925A (en) * | 2018-11-21 | 2019-04-23 | 北京市天元网络技术股份有限公司 | The management method and device of unstructured data |
CN109919749A (en) * | 2019-03-29 | 2019-06-21 | 北京思特奇信息技术股份有限公司 | A kind of account checking method, system, storage medium and computer equipment |
CN110442644A (en) * | 2019-07-08 | 2019-11-12 | 深圳壹账通智能科技有限公司 | Block chain data filing storage method, device, computer equipment and storage medium |
CN111191794A (en) * | 2019-12-29 | 2020-05-22 | 广东浪潮大数据研究有限公司 | Training task processing method, device and equipment and readable storage medium |
CN111580954A (en) * | 2020-04-01 | 2020-08-25 | 中国科学院信息工程研究所 | Extensible distributed data acquisition method and system |
CN112052093A (en) * | 2020-09-08 | 2020-12-08 | 哈尔滨工业大学 | Experimental big data resource allocation management system based on message queue technology |
CN112052095A (en) * | 2020-09-11 | 2020-12-08 | 成都深思科技有限公司 | Distributed high-availability big data mining task scheduling system |
CN112162841A (en) * | 2020-09-30 | 2021-01-01 | 重庆长安汽车股份有限公司 | Distributed scheduling system, method and storage medium for big data processing |
CN114416601A (en) * | 2022-03-30 | 2022-04-29 | 南京赛宁信息技术有限公司 | Network security information acquisition engine and task management system and method |
CN116502868A (en) * | 2023-06-25 | 2023-07-28 | 北京云行在线软件开发有限责任公司 | Distributed scheduling engine system and distributed scheduling method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036025A (en) * | 2014-06-27 | 2014-09-10 | 蓝盾信息安全技术有限公司 | Distribution-base mass log collection system |
US20150294251A1 (en) * | 2014-04-11 | 2015-10-15 | Nec Europe Ltd. | Distributed task scheduling using multiple agent paradigms |
CN105205169A (en) * | 2015-10-12 | 2015-12-30 | 中国电子科技集团公司第二十八研究所 | Distributed image index and retrieval method |
CN105468735A (en) * | 2015-11-23 | 2016-04-06 | 武汉虹旭信息技术有限责任公司 | Stream preprocessing system and method based on mass information of mobile internet |
CN105791354A (en) * | 2014-12-23 | 2016-07-20 | 中兴通讯股份有限公司 | Job scheduling method and cloud scheduling server |
CN105786864A (en) * | 2014-12-24 | 2016-07-20 | 国家电网公司 | Offline analysis method for massive data |
-
2016
- 2016-08-23 CN CN201610712300.1A patent/CN107766147A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150294251A1 (en) * | 2014-04-11 | 2015-10-15 | Nec Europe Ltd. | Distributed task scheduling using multiple agent paradigms |
CN104036025A (en) * | 2014-06-27 | 2014-09-10 | 蓝盾信息安全技术有限公司 | Distribution-base mass log collection system |
CN105791354A (en) * | 2014-12-23 | 2016-07-20 | 中兴通讯股份有限公司 | Job scheduling method and cloud scheduling server |
CN105786864A (en) * | 2014-12-24 | 2016-07-20 | 国家电网公司 | Offline analysis method for massive data |
CN105205169A (en) * | 2015-10-12 | 2015-12-30 | 中国电子科技集团公司第二十八研究所 | Distributed image index and retrieval method |
CN105468735A (en) * | 2015-11-23 | 2016-04-06 | 武汉虹旭信息技术有限责任公司 | Stream preprocessing system and method based on mass information of mobile internet |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985981B (en) * | 2018-06-28 | 2021-04-23 | 北京奇虎科技有限公司 | Data processing system and method |
CN109033196A (en) * | 2018-06-28 | 2018-12-18 | 北京奇虎科技有限公司 | A kind of distributed data scheduling system and method |
CN108985981A (en) * | 2018-06-28 | 2018-12-11 | 北京奇虎科技有限公司 | Data processing system and method |
CN109194538A (en) * | 2018-08-03 | 2019-01-11 | 平安科技(深圳)有限公司 | Test method, device, server and storage medium based on distributed coordination |
CN109669925A (en) * | 2018-11-21 | 2019-04-23 | 北京市天元网络技术股份有限公司 | The management method and device of unstructured data |
CN109919749A (en) * | 2019-03-29 | 2019-06-21 | 北京思特奇信息技术股份有限公司 | A kind of account checking method, system, storage medium and computer equipment |
CN110442644A (en) * | 2019-07-08 | 2019-11-12 | 深圳壹账通智能科技有限公司 | Block chain data filing storage method, device, computer equipment and storage medium |
CN111191794A (en) * | 2019-12-29 | 2020-05-22 | 广东浪潮大数据研究有限公司 | Training task processing method, device and equipment and readable storage medium |
CN111191794B (en) * | 2019-12-29 | 2023-03-14 | 广东浪潮大数据研究有限公司 | Training task processing method, device and equipment and readable storage medium |
CN111580954A (en) * | 2020-04-01 | 2020-08-25 | 中国科学院信息工程研究所 | Extensible distributed data acquisition method and system |
CN112052093A (en) * | 2020-09-08 | 2020-12-08 | 哈尔滨工业大学 | Experimental big data resource allocation management system based on message queue technology |
CN112052095A (en) * | 2020-09-11 | 2020-12-08 | 成都深思科技有限公司 | Distributed high-availability big data mining task scheduling system |
CN112052095B (en) * | 2020-09-11 | 2024-04-19 | 成都锋卫科技有限公司 | Distributed high-availability big data mining task scheduling system |
CN112162841A (en) * | 2020-09-30 | 2021-01-01 | 重庆长安汽车股份有限公司 | Distributed scheduling system, method and storage medium for big data processing |
CN114416601A (en) * | 2022-03-30 | 2022-04-29 | 南京赛宁信息技术有限公司 | Network security information acquisition engine and task management system and method |
CN114416601B (en) * | 2022-03-30 | 2022-07-19 | 南京赛宁信息技术有限公司 | Network security information acquisition engine and task management system and method |
CN116502868A (en) * | 2023-06-25 | 2023-07-28 | 北京云行在线软件开发有限责任公司 | Distributed scheduling engine system and distributed scheduling method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766147A (en) | Distributed data analysis task scheduling system | |
US10606711B2 (en) | Recovery strategy for a stream processing system | |
US11086688B2 (en) | Managing resource allocation in a stream processing framework | |
US9965330B2 (en) | Maintaining throughput of a stream processing framework while increasing processing load | |
JP6523354B2 (en) | State machine builder with improved interface and handling of state independent events | |
US9619430B2 (en) | Active non-volatile memory post-processing | |
US9680893B2 (en) | Method and system for event state management in stream processing | |
US10783436B2 (en) | Deep learning application distribution | |
US9323580B2 (en) | Optimized resource management for map/reduce computing | |
CN105045607A (en) | Method for achieving uniform interface of multiple big data calculation frames | |
CN102169505A (en) | Recommendation system building method based on cloud computing | |
CN104008012A (en) | High-performance MapReduce realization mechanism based on dynamic migration of virtual machine | |
dos Anjos et al. | Smart: An application framework for real time big data analysis on heterogeneous cloud environments | |
Agrawal et al. | Adaptive real‐time anomaly detection in cloud infrastructures | |
US11422858B2 (en) | Linked workload-processor-resource-schedule/processing-system—operating-parameter workload performance system | |
US11489942B2 (en) | Time series data analysis | |
Khan et al. | Towards enhancing the capability of IoT applications by utilizing cloud computing concept | |
US11042530B2 (en) | Data processing with nullable schema information | |
Khan | Hadoop performance modeling and job optimization for big data analytics | |
US20220374283A1 (en) | System For Optimizing Resources For Cloud-Based Scalable Distributed Search Data Analytics Service | |
Bengre et al. | A learning-based scheduler for high volume processing in data warehouse using graph neural networks | |
US8838414B2 (en) | Determining when to create a prediction based on deltas of metric values | |
US10171378B2 (en) | System and method for allocating and reserving supervisors in a real-time distributed processing platform | |
CN116389502B (en) | Cross-cluster scheduling system, method, device, computer equipment and storage medium | |
US20240012667A1 (en) | Resource prediction for microservices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180306 |