CN107766147A - Distributed data analysis task scheduling system - Google Patents

Distributed data analysis task scheduling system Download PDF

Info

Publication number
CN107766147A
CN107766147A CN201610712300.1A CN201610712300A CN107766147A CN 107766147 A CN107766147 A CN 107766147A CN 201610712300 A CN201610712300 A CN 201610712300A CN 107766147 A CN107766147 A CN 107766147A
Authority
CN
China
Prior art keywords
distributed
task
data
data analysis
task scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610712300.1A
Other languages
Chinese (zh)
Inventor
孙冬雪
万英杰
李娟�
史宁
鲍远松
黄明
李亚贝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Baosight Software Co Ltd
Original Assignee
Shanghai Baosight Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Baosight Software Co Ltd filed Critical Shanghai Baosight Software Co Ltd
Priority to CN201610712300.1A priority Critical patent/CN107766147A/en
Publication of CN107766147A publication Critical patent/CN107766147A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Abstract

The invention provides a kind of distributed data analysis task scheduling system, including:Distributed Storage service module, resource-based distributed task dispatching engine modules, Distributed Message Queue module, distributed application program coordination service module, automatic enforcement engine module.The present invention proposes a kind of distributed task dispatching framework that data analysis is carried out using R language, and distributed resource management is realized using resource management platform, and then realizes that the distributed scheduling of R language analyses performs;Using Automatic dispatching engine, realize that the task of data fragmentation is called automatically, meet the automated analysis demand of industrial process data tracking.

Description

Distributed data analysis task scheduling system
Technical field
The present invention relates to data analysis task scheduling, in particular it relates to distributed data analysis task scheduling system, energy The scheduling for being enough widely used in the data analysis formula of industrial process data performs.
Background technology
With the object tracking such as development, mobile device, RFID of the technologies such as the lasting depth propulsion of industry 4.0 and Internet of Things The application of equipment in the industrial production is more and more extensive, and explosive increase of data will turn into trend.At the same time, enterprise's essence The propulsion of the management of refinement needs the analysis of more data volumes and wider data dimension to provide support for business decision, and With the use of Enterprise Informatization system for many years, many enterprises all generate substantial amounts of historical data, but analyze profit to it With and it is insufficient.The how fully existing and newly-increased a large number of services data of digging utilizationThe traditional commercial Application of comparative study BI instruments, be primarily present it is following some deficiency:
(1) substantial amounts of data analysis task can not carry out distributed scheduling automatically, frequently result in the accumulation of data analysis task In individual node, node resource is caused to consume more, execution efficiency is low.
(2) there is single node failure in data analysis task, after the analysis task failure of single node is run on, no weight Open execution mechanism.
(3) defect effectively timed task scheduling feature, fixed cycle or the analysis task that fixed interval performs can not be met Dispatching requirement.
(4) data storage of data source and analysis result is generally without distributed frame is utilized, as a result easily by result The influence of the unit failure of memory node, it is possible to cause loss of data.
R language is the language of increasing income exclusively for statistics and data analysis exploitation, good to different operating system compatibility, is programmed Succinctly, it is programming tool platform that statistical analysis personnel prefer.Ripe data mining algorithm bag is abundant and is constantly increasing, Also there is powerful analysis result visualization model, as ggplot multi-layer image is drawn.But its shortcoming is:Big text-processing is compared Difference, although data analysis component is very strong, lack for data management part, so being frequently necessary to after external environment condition is carried out After data segmentation, return again to R language platforms and carry out analysis application.
In disclosed paper studies, Yang Xia, Wu Dongwei's《Application of the R language in big data processing》, mainly The characteristics of introducing the RHadoop expanding packets of Revolution Analytics companies and occupation mode, can be in R using the bag Map-Reduce programs are write, Liu Wenfei's《Integrated technology and its realization based on R language and Hadoop》, mainly describe Integrated using Hadoop Streaming mode and perform R programs.Application No. CN201610074884.4, title:It is distributed The task of Computational frame sends system, this publication disclose a kind of task of distributed computing framework and sends system, wherein Including application server, task queue service platform and Redi s service platforms.Application server is used to dispose at multiple business Reason service;Task queue service platform passes through networking cluster by multiple tasks server, is disposed in task queue service platform Zookeeper is serviced, and task-scheduling operation is used for the new client task for handling the message queue of zookeeper services;By more Individual Redi s servers connect and compose through networking, and Redis service platforms are connected to task queue service platform via networking, Redi s service platforms call treatment progress according to the new client task added in message queue, and treatment progress enters to client task Row cleans and exports the first business result and stores into Redi s buffer memories;The real-time computing module of Redis service platforms Detect and the first new business result in Redis buffer memories be present, real-time computing module is calculated the first business result And export the second business result.
Technical essential compares:For the present invention compared with the patent document, technical pattern difference is obvious, is equally being based on Zookeeper service purpose is different, and the invention is used for message queue client task, is then used to perform timed task in the present invention Single node it is fault-tolerant.For the present invention without Redis is used, the patent document does not have timer-triggered scheduler strategy yet.
The content of the invention
For in the prior art the defects of, it is an object of the invention to provide a kind of distributed data analysis task scheduling system System.The advantages of the technical problem to be solved in the present invention is how to make full use of the data analysis of R platforms strong, flexible, utilize big number According to distributed resource management service under environment, a resource-based distributed data analysis task scheduling system, side are built Just the data analyst during Industrial Analysis uses the focus for being the present invention.
According to a kind of distributed data analysis task scheduling system provided by the invention, including:
Distributed Storage service module:Stored by non-relational database, pass through distributed search engine Carry out the retrieval of data, there is provided Distributed Storage service
Resource-based distributed task dispatching engine modules:Carry out resource management, resources control, task scheduling with Track, there is provided task scheduling service;
Distributed Message Queue module:Realize the issue of data with subscribing to function by Distributed Message Queue;
Distributed application program coordination service module:The subsequent execution of automatic enforcement engine task in single node is carried out It is fault-tolerant;
Automatic enforcement engine module:Data analysis task is analyzed.
Preferably, non-relational database uses database HBase;Distributed search engine is using search application server Solr。
Preferably, Distributed Storage service module is the data source of analysis task.
Preferably, Distributed Storage service module is the memory carrier of assignment file and data results.
The distributed task dispatching engine modules of resource are preferably based on, are that resource is carried out based on explorer YARN Management, resources control, task scheduling and tracking.
Preferably, automatic enforcement engine module, according to cycle access, according to clocked flip two ways, to data analysis Task does continual analysis, the Automatic dispatching engine on node is carried out using distributed application program coordination service fault-tolerant.
Preferably, in system initialisation phase, the application program of each node is from Distributed Storage service module The data analysis task of the run mode based on timer-triggered scheduler engine is loaded, task is created in Distributed Message Queue and changes topic TOPIC, loading distributed application program coordination service ZooKeeper;
SDK interface interchange Distributed Storage service modules are provided, acquisition is converted into kernel data structure in R language The data source of Data.Frame structures, data results are saved in distributed storage carrier;
The task of run mode, the variation of Automatic dispatching engine strategy are broadcasted by Distributed Message Queue in cluster, each section Point subscription task changes topic TOPIC respectively in respective node updates Automatic dispatching engine strategy.
Preferably, the data task of load operating state is initialized, for appointing after continuing in last time cluster entirety disaster Business performs.
Preferably, the internal memory of explorer control individual task, CPU usage amount are no more than the application value of the task, And provide execution journal and status inquiry.
Preferably, analyze script file to obtain from distributed storage service, broadcasted in the cluster by HDFS.
According to a kind of distributed data analysis method for scheduling task provided by the invention, including:
Distributed Storage service steps:Stored by non-relational database, pass through distributed search engine Carry out the retrieval of data, there is provided Distributed Storage service
Resource-based distributed task dispatching engine step:Carry out resource management, resources control, task scheduling with Track, there is provided task scheduling service;
Distributed Message Queue step:Realize the issue of data with subscribing to function by Distributed Message Queue;
Distributed application program coordination service step:The subsequent execution of automatic enforcement engine task in single node is carried out It is fault-tolerant;
Automatic enforcement engine step:Data analysis task is analyzed.
Preferably, non-relational database uses database HBase;Distributed search engine is using search application server Solr。
Preferably, Distributed Storage service steps are the data sources of analysis task.
Preferably, Distributed Storage service steps are the memory carriers of assignment file and data results.
The distributed task dispatching engine step of resource is preferably based on, is that resource is carried out based on explorer YARN Management, resources control, task scheduling and tracking.
Preferably, automatic enforcement engine step, according to cycle access, according to clocked flip two ways, to data analysis Task does continual analysis, the Automatic dispatching engine on node is carried out using distributed application program coordination service fault-tolerant.
Preferably, loaded in method initial phase, the application program of each node from Distributed Storage service The data analysis task of run mode based on timer-triggered scheduler engine, task variation topic is created in Distributed Message Queue TOPIC, loading distributed application program coordination service ZooKeeper;
SDK interface interchange Distributed Storage services are provided, acquisition is converted into kernel data structure in R language The data source of Data.Frame structures, data results are saved in distributed storage carrier;
The task of run mode, the variation of Automatic dispatching engine strategy are broadcasted by Distributed Message Queue in cluster, each section Point subscription task changes topic TOPIC respectively in respective node updates Automatic dispatching engine strategy.
Preferably, the data task of load operating state is initialized, for appointing after continuing in last time cluster entirety disaster Business performs.
Preferably, the internal memory of explorer control individual task, CPU usage amount are no more than the application value of the task, And provide execution journal and status inquiry.
Preferably, analyze script file to obtain from distributed storage service, broadcasted in the cluster by HDFS.
Compared with prior art, the present invention has following beneficial effect:
The present invention proposes a kind of distributed task dispatching framework that data analysis is carried out using R language, utilizes resource pipe Platform realizes distributed resource management, and then realizes that the distributed scheduling of R language analyses performs;Using Automatic dispatching engine, Realize that the task of data fragmentation is called automatically, meet the automated analysis demand of industrial process data tracking.
The present invention possesses fault-tolerance using distributed scheduling architecture, autgmentability, can be that industrial process data analysis carries Ensure for reliable, the ever-increasing resource requirement of data analysis task can be also met by way of horizontal extension, finally Process data, the value of historical data are preferably played, support is provided for the decision-making of enterprise, is provided for the intellectuality of manufacturing process Data basis, support is provided for the transition of enterprise.
One of main emphasis of the present invention cut based on data, towards different parameters, identical calculations process, persistently The data analysis process frequently tracked, in the case where not increasing the development difficulty for the analysis program that user writes R, there is provided Availability, fault-tolerance, the strong task scheduling service architecture for being capable of perform script task parallel of expansion.
Brief description of the drawings
The detailed description made by reading with reference to the following drawings to non-limiting example, further feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the structural representation of system provided by the invention.
Embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, some changes and improvements can also be made.These belong to the present invention Protection domain.
Fig. 1 that the framework of distributed data analysis task scheduling system is shown in accompanying drawing, mainly by following several module groups Into:
Distributed Storage service module:Distributed storage service is stored by Nosql databases HBase, The retrieval of data is realized by distributed search engine Solr, meets that Large Copacity, highly reliable, high-performance, data trnascription are safe, dynamic The characteristics of state extended capability is strong.This module can as the data source of branch office's analysis task, can also be used as assignment file and The memory carrier of data results.
Resource-based distributed task dispatching engine modules:Task scheduling engine is based on YARN and carries out resource management, money Source control, task scheduling and tracking.
Distributed Message Queue module:Distributed Message Queue is disappeared using the Kafka message queues increased income by distribution Breath queue realizes that the issue of data must is fulfilled for high-throughput, high reliability and persistence energy with subscribing to function, the message queue Power, so as to realize the transmitting of data.
Distributed application program coordination service module:Based on open source technology ZooKeeper.It is mainly used in pair in the system The subsequent execution of automatic enforcement engine task in single node carries out fault-tolerant.
Automatic enforcement engine module:Automatic enforcement engine is based on the Quartz frameworks increased income.For industrial process part number The characteristics of according to needing persistently to track, it is broadly divided into:There is provided according to cycle access and according to clocked flip two ways, to data point Analysis task does continual analysis.The Automatic dispatching engine on node is carried out using distributed application program coordination service fault-tolerant.
The flow of distributed data analyzing task scheduling is as follows:
(1) system initialization, the application program of each node loads from Distributed Storage service to be adjusted based on timing The data analysis task of the run mode of engine is spent, the task that created in Distributed Message Queue changes TOPIC, loading ZooKeeper is serviced.Initialize load operating state data task purpose be continue last time cluster entirety disaster (such as Whole cluster power-off) on after tasks carrying.
(2) SDK interface interchange Distributed Storage services are provided, acquisition is converted into kernel data structure in R The data source of Data.Frame structures, while data results (analysis picture, Study document etc.) can be saved in distribution In formula memory carrier.
(3) task of run mode, the variation of Automatic dispatching engine strategy are broadcasted by Distributed Message Queue in cluster, respectively Node subscription task changes TOPIC respectively in respective node updates Automatic dispatching engine strategy.
(4) explorer based on YARN will control individual task its internal memory, CPU usage amount no more than its application Value, and execution journal and status inquiry are provided.Wherein analyze script file to obtain from distributed storage service, existed by HDFS Broadcasted in cluster.
Distributed task dispatching flow characteristic is analyzed:
(1) high reliability:System is using distributed structure/architecture without Single Point of Faliure, distributed storage service, distributed message team Increase income big data technology Hbase, kafka, Solr, ZooKeeper that row, distributed application program coordination service are based on have Its own fault tolerant mechanism, and for the Automatic dispatching engine in node, system utilizes ZooKeeper Leader uniqueness, adopts Just perform the mechanism of the execution of data analysis task with the node where only Leader, after Single Point of Faliure, ZooKeeper can be selected New Leader is enumerated, therefore make use of ZooKeeper to realize the fault-tolerant of the tasks carrying based on Automatic dispatching engine strategy. Single Point of Faliure problem is also not present also with distributed storage service in the storage of assignment file.
(2) horizontal extension is facilitated:Because system architecture uses Distributed Design, therefore only need to increase node can reality The horizontal expansion of existing cluster, to adapt to more resources of data analysis mission requirements.
(3) isolation of task resource:Using the resource isolation of YARN resource management systems, can solve data analysis task Between influencing each other on resource is fought for, prevented a certain task from monopolizing system resource, and it is long-term etc. to cause other tasks to be absorbed in Treat to perform.
(4) data fragmentation performs with tasks in parallel:The HashKey of data storage definition can be utilized regular, R scripts Incoming parameter, the HashKey that presses of Distributed Storage service are inquired about, and the burst of data are realized, by original large data sets Task is cut into subtask, and Parallel Scheduling performs in distributed task dispatching system.The data analysis for taking into account R system is strong , make up the deficiency in terms of its data management.
In preferred embodiment, using following configuration:
(1) four X86 servers (being named as A, B, C, D) are provided at, and memory configurations are not less than 64G, CPU recommends most Low E2650.
(2) Distributed Message Queue service is disposed, kafka is deployed in tetra- machines of A, B, C, D simultaneously and completes cluster Configuration.
(3) distributed application program coordination service is disposed, ZooKeeper is deployed in tetra- machines of A, B, C, D simultaneously simultaneously Complete the configuration of cluster.
(4) distributed storage service is disposed, HBase master is deployed in node A, node B, C, D are disposed respectively RegionServer, while the configuration of Hadoop environment is completed, Hadoop Namenode is deployed in node A, node B, C, D DataNode is disposed respectively and completes the configuration of cluster.
(5) distributed task dispatching service is disposed, Yarn is deployed in tetra- machines of A, B, C, D simultaneously and completes cluster Configuration.
One skilled in the art will appreciate that except realizing system provided by the invention in a manner of pure computer readable program code And its beyond each device, module, unit, completely can be by the way that method and step progress programming in logic be provided come the present invention System and its each device, module, unit with gate, switch, application specific integrated circuit, programmable logic controller (PLC) and embedding Enter the form of the controller that declines etc. to realize identical function.So system provided by the invention and its every device, module, list Member is considered a kind of hardware component, and is used to realize that device, module, the unit of various functions also may be used to what is included in it To be considered as the structure in hardware component;It both can be real that will can also be considered as device, module, the unit of realizing various functions The software module of existing method can be the structure in hardware component again.
The specific embodiment of the present invention is described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make a variety of changes or change within the scope of the claims, this not shadow Ring the substantive content of the present invention.In the case where not conflicting, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims (10)

  1. A kind of 1. distributed data analysis task scheduling system, it is characterised in that including:
    Distributed Storage service module:Stored by non-relational database, carried out by distributed search engine The retrieval of data, there is provided Distributed Storage service
    Resource-based distributed task dispatching engine modules:Resource management, resources control, task scheduling and tracking are carried out, is carried For task scheduling service;
    Distributed Message Queue module:Realize the issue of data with subscribing to function by Distributed Message Queue;
    Distributed application program coordination service module:The subsequent execution of automatic enforcement engine task in single node is held It is wrong;
    Automatic enforcement engine module:Data analysis task is analyzed.
  2. 2. distributed data analysis task scheduling system according to claim 1, it is characterised in that non-relational data Storehouse uses database HBase;Distributed search engine is using search application server Solr.
  3. 3. distributed data analysis task scheduling system according to claim 1, it is characterised in that distributed data is deposited Storage service module is the data source of analysis task.
  4. 4. distributed data analysis task scheduling system according to claim 1, it is characterised in that distributed data is deposited Storage service module is the memory carrier of assignment file and data results.
  5. 5. distributed data analysis task scheduling system according to claim 1, it is characterised in that resource-based point Cloth task scheduling engine module, be based on explorer YARN carry out resource management, resources control, task scheduling with Track.
  6. 6. distributed data analysis task scheduling system according to claim 1, it is characterised in that automatic enforcement engine Module, according to cycle access, according to clocked flip two ways, continual analysis is done to data analysis task, should using distribution The Automatic dispatching engine on node is carried out with Program Coordination service fault-tolerant.
  7. 7. distributed data analysis task scheduling system according to claim 1, it is characterised in that in system initialization Stage, the application program of each node load the run mode based on timer-triggered scheduler engine from Distributed Storage service module Data analysis task, in Distributed Message Queue create task change topic TOPIC, loading distributed application program coordinate Service ZooKeeper;
    SDK interface interchange Distributed Storage service modules are provided, acquisition is converted into kernel data structure in R language The data source of Data.Frame structures, data results are saved in distributed storage carrier;
    The task of run mode, the variation of Automatic dispatching engine strategy are broadcasted by Distributed Message Queue in cluster, and each node is ordered Read task and change topic TOPIC respectively in respective node updates Automatic dispatching engine strategy.
  8. 8. distributed data analysis task scheduling system according to claim 7, it is characterised in that initialization loading fortune The data task of row state, for the tasks carrying after continuing in last time cluster entirety disaster.
  9. 9. distributed data analysis task scheduling system according to claim 7, it is characterised in that explorer control The internal memory of individual task processed, CPU usage amount are no more than the application value of the task, and provide execution journal and status inquiry.
  10. 10. distributed data analysis task scheduling system according to claim 9, it is characterised in that analysis script text Part obtains from distributed storage service, is broadcasted in the cluster by HDFS.
CN201610712300.1A 2016-08-23 2016-08-23 Distributed data analysis task scheduling system Pending CN107766147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610712300.1A CN107766147A (en) 2016-08-23 2016-08-23 Distributed data analysis task scheduling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610712300.1A CN107766147A (en) 2016-08-23 2016-08-23 Distributed data analysis task scheduling system

Publications (1)

Publication Number Publication Date
CN107766147A true CN107766147A (en) 2018-03-06

Family

ID=61264393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610712300.1A Pending CN107766147A (en) 2016-08-23 2016-08-23 Distributed data analysis task scheduling system

Country Status (1)

Country Link
CN (1) CN107766147A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985981A (en) * 2018-06-28 2018-12-11 北京奇虎科技有限公司 Data processing system and method
CN109033196A (en) * 2018-06-28 2018-12-18 北京奇虎科技有限公司 A kind of distributed data scheduling system and method
CN109194538A (en) * 2018-08-03 2019-01-11 平安科技(深圳)有限公司 Test method, device, server and storage medium based on distributed coordination
CN109669925A (en) * 2018-11-21 2019-04-23 北京市天元网络技术股份有限公司 The management method and device of unstructured data
CN109919749A (en) * 2019-03-29 2019-06-21 北京思特奇信息技术股份有限公司 A kind of account checking method, system, storage medium and computer equipment
CN110442644A (en) * 2019-07-08 2019-11-12 深圳壹账通智能科技有限公司 Block chain data filing storage method, device, computer equipment and storage medium
CN111191794A (en) * 2019-12-29 2020-05-22 广东浪潮大数据研究有限公司 Training task processing method, device and equipment and readable storage medium
CN111580954A (en) * 2020-04-01 2020-08-25 中国科学院信息工程研究所 Extensible distributed data acquisition method and system
CN112052093A (en) * 2020-09-08 2020-12-08 哈尔滨工业大学 Experimental big data resource allocation management system based on message queue technology
CN112052095A (en) * 2020-09-11 2020-12-08 成都深思科技有限公司 Distributed high-availability big data mining task scheduling system
CN112162841A (en) * 2020-09-30 2021-01-01 重庆长安汽车股份有限公司 Distributed scheduling system, method and storage medium for big data processing
CN114416601A (en) * 2022-03-30 2022-04-29 南京赛宁信息技术有限公司 Network security information acquisition engine and task management system and method
CN116502868A (en) * 2023-06-25 2023-07-28 北京云行在线软件开发有限责任公司 Distributed scheduling engine system and distributed scheduling method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
US20150294251A1 (en) * 2014-04-11 2015-10-15 Nec Europe Ltd. Distributed task scheduling using multiple agent paradigms
CN105205169A (en) * 2015-10-12 2015-12-30 中国电子科技集团公司第二十八研究所 Distributed image index and retrieval method
CN105468735A (en) * 2015-11-23 2016-04-06 武汉虹旭信息技术有限责任公司 Stream preprocessing system and method based on mass information of mobile internet
CN105791354A (en) * 2014-12-23 2016-07-20 中兴通讯股份有限公司 Job scheduling method and cloud scheduling server
CN105786864A (en) * 2014-12-24 2016-07-20 国家电网公司 Offline analysis method for massive data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150294251A1 (en) * 2014-04-11 2015-10-15 Nec Europe Ltd. Distributed task scheduling using multiple agent paradigms
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN105791354A (en) * 2014-12-23 2016-07-20 中兴通讯股份有限公司 Job scheduling method and cloud scheduling server
CN105786864A (en) * 2014-12-24 2016-07-20 国家电网公司 Offline analysis method for massive data
CN105205169A (en) * 2015-10-12 2015-12-30 中国电子科技集团公司第二十八研究所 Distributed image index and retrieval method
CN105468735A (en) * 2015-11-23 2016-04-06 武汉虹旭信息技术有限责任公司 Stream preprocessing system and method based on mass information of mobile internet

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985981B (en) * 2018-06-28 2021-04-23 北京奇虎科技有限公司 Data processing system and method
CN109033196A (en) * 2018-06-28 2018-12-18 北京奇虎科技有限公司 A kind of distributed data scheduling system and method
CN108985981A (en) * 2018-06-28 2018-12-11 北京奇虎科技有限公司 Data processing system and method
CN109194538A (en) * 2018-08-03 2019-01-11 平安科技(深圳)有限公司 Test method, device, server and storage medium based on distributed coordination
CN109669925A (en) * 2018-11-21 2019-04-23 北京市天元网络技术股份有限公司 The management method and device of unstructured data
CN109919749A (en) * 2019-03-29 2019-06-21 北京思特奇信息技术股份有限公司 A kind of account checking method, system, storage medium and computer equipment
CN110442644A (en) * 2019-07-08 2019-11-12 深圳壹账通智能科技有限公司 Block chain data filing storage method, device, computer equipment and storage medium
CN111191794A (en) * 2019-12-29 2020-05-22 广东浪潮大数据研究有限公司 Training task processing method, device and equipment and readable storage medium
CN111191794B (en) * 2019-12-29 2023-03-14 广东浪潮大数据研究有限公司 Training task processing method, device and equipment and readable storage medium
CN111580954A (en) * 2020-04-01 2020-08-25 中国科学院信息工程研究所 Extensible distributed data acquisition method and system
CN112052093A (en) * 2020-09-08 2020-12-08 哈尔滨工业大学 Experimental big data resource allocation management system based on message queue technology
CN112052095A (en) * 2020-09-11 2020-12-08 成都深思科技有限公司 Distributed high-availability big data mining task scheduling system
CN112052095B (en) * 2020-09-11 2024-04-19 成都锋卫科技有限公司 Distributed high-availability big data mining task scheduling system
CN112162841A (en) * 2020-09-30 2021-01-01 重庆长安汽车股份有限公司 Distributed scheduling system, method and storage medium for big data processing
CN114416601A (en) * 2022-03-30 2022-04-29 南京赛宁信息技术有限公司 Network security information acquisition engine and task management system and method
CN114416601B (en) * 2022-03-30 2022-07-19 南京赛宁信息技术有限公司 Network security information acquisition engine and task management system and method
CN116502868A (en) * 2023-06-25 2023-07-28 北京云行在线软件开发有限责任公司 Distributed scheduling engine system and distributed scheduling method

Similar Documents

Publication Publication Date Title
CN107766147A (en) Distributed data analysis task scheduling system
US10606711B2 (en) Recovery strategy for a stream processing system
US11086688B2 (en) Managing resource allocation in a stream processing framework
US9965330B2 (en) Maintaining throughput of a stream processing framework while increasing processing load
JP6523354B2 (en) State machine builder with improved interface and handling of state independent events
US9619430B2 (en) Active non-volatile memory post-processing
US9680893B2 (en) Method and system for event state management in stream processing
US10783436B2 (en) Deep learning application distribution
US9323580B2 (en) Optimized resource management for map/reduce computing
CN105045607A (en) Method for achieving uniform interface of multiple big data calculation frames
CN102169505A (en) Recommendation system building method based on cloud computing
CN104008012A (en) High-performance MapReduce realization mechanism based on dynamic migration of virtual machine
dos Anjos et al. Smart: An application framework for real time big data analysis on heterogeneous cloud environments
Agrawal et al. Adaptive real‐time anomaly detection in cloud infrastructures
US11422858B2 (en) Linked workload-processor-resource-schedule/processing-system—operating-parameter workload performance system
US11489942B2 (en) Time series data analysis
Khan et al. Towards enhancing the capability of IoT applications by utilizing cloud computing concept
US11042530B2 (en) Data processing with nullable schema information
Khan Hadoop performance modeling and job optimization for big data analytics
US20220374283A1 (en) System For Optimizing Resources For Cloud-Based Scalable Distributed Search Data Analytics Service
Bengre et al. A learning-based scheduler for high volume processing in data warehouse using graph neural networks
US8838414B2 (en) Determining when to create a prediction based on deltas of metric values
US10171378B2 (en) System and method for allocating and reserving supervisors in a real-time distributed processing platform
CN116389502B (en) Cross-cluster scheduling system, method, device, computer equipment and storage medium
US20240012667A1 (en) Resource prediction for microservices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180306