CN107066316A - Alleviate the dispatching method and system of memory pressure in distributed data processing system - Google Patents

Alleviate the dispatching method and system of memory pressure in distributed data processing system Download PDF

Info

Publication number
CN107066316A
CN107066316A CN201710273273.7A CN201710273273A CN107066316A CN 107066316 A CN107066316 A CN 107066316A CN 201710273273 A CN201710273273 A CN 201710273273A CN 107066316 A CN107066316 A CN 107066316A
Authority
CN
China
Prior art keywords
task
memory
data
information
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710273273.7A
Other languages
Chinese (zh)
Inventor
石宣化
金海�
张�雄
柯志祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201710273273.7A priority Critical patent/CN107066316A/en
Publication of CN107066316A publication Critical patent/CN107066316A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/465Distributed object oriented systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the dispatching method for slowing down memory pressure in a kind of distributed data processing system, including:Internal memory laws of use is analyzed to the operating feature of key-value pair according to user program interface, the internal memory for the user program interface set up in data handling system uses model;The internal memory for speculating task according to the order of task call DLL uses model;Different models are distinguished using EMS memory occupation growth rate;Model and processing data amount are used according to the internal memory of currently running for task, influence of each task to memory pressure is estimated when memory pressure is big, the big task of influence is hung up, small tasks carrying is completed or memory pressure is released until influenceing.Influence of the present invention by all tasks when monitoring in real time and analysis are run in a data processing system to memory pressure, improves the scalability of service system.

Description

Alleviate the dispatching method and system of memory pressure in distributed data processing system
Technical field
The invention belongs to distributed system field, in alleviating in a kind of distributed data processing system Deposit the dispatching method and system of pressure.
Background technology
Application of the distributed data processing system in big data processing is more and more extensive, and development is also very fast, and this is obtained Data handling system beneficial to most types is all developed with advanced Object-Oriented, such as Java, C# etc..However, being on the one hand The limitation in hardware memory space, is on the one hand the performing environment that this kind of object oriented language has trustship, such as JVM .NET etc., number According to this there is internal memory in object form, introduces the data structures such as extra reference, modification data, causes internal memory expansion issues to protrude. Meanwhile, hosting environment is by the automatic managing internal memory of garbage collection operation, under the problem of internal memory expands, a large amount of long life cycles pair As surviving in internal memory, hosting environment is caused to frequently execute garbage reclamation, memory source utilization rate is very low, and memory pressure is very Greatly.With the development of distributed data processing system, the computing system based on internal memory, such as Spark, Flink will be a large amount of important Intermediate data be buffered in internal memory come Accelerated iteration calculate application.Therefore, in internal memory computing system, memory pressure seems more next It is more prominent.
At present, in the alleviation internal memory computing system mainly used the method for memory pressure be when memory pressure is larger, from All tasks randomly choose a part of task again, and the data that task is held are split as into four parts:Local data structure, The inputting of processing, untreated input, intermediate data result and final result, then by the local data structure of choosing for task Delete to reduce memory pressure with processed part.
However, there is certain defect in the above method:One side is that the selection of task has randomness, does not account for different appoint Whether the internal memory laws of use of business is different, and whether the influence to memory pressure is identical, and in service-oriented data handling system In, the memory pressure that partial task is caused will influence the task of all operations;On the other hand, in service-oriented data processing system The data object that partial task has been handled in system is temporary object, spends extra expense to discharge the part that these tasks are occupied Data space is lost more than gain;Therefore, existing method is not suitable for the data handling system of service-oriented pattern.
The content of the invention
For the defect of prior art, in alleviating in a kind of distributed data processing system The dispatching method of pressure is deposited, it is intended that the data handling system of service-oriented pattern can not be applied to by solving existing method Technical problem.
To achieve these goals, the invention provides a kind of dispatching party for alleviating memory pressure in data handling system Method,
By the above technical scheme of present inventive concept, compared with prior art, system of the invention has following excellent Point and technique effect:
1st, the present invention can solve the problem that existing method due to the processing mode to all tasks it is identical caused by can not be applied to The technical problem of the data handling system of service-oriented pattern:Due to arriving step (6) present invention employs step (1), according to every The execution information of individual task in a data processing system is specific such as the input information and output information, memory information of mission bit stream Memory occupation information etc., calculate the EMS memory occupation growth rate of each task, and task is assessed according to EMS memory occupation growth rate Influence to memory pressure, therefore, the present invention can clearly distinguish influence of the different task to memory pressure, so that in first handling Deposit and take the small task of growth rate, slow down the memory pressure in data handling system, it is to avoid the task small to internal memory pressure influence Waited for a long time.
2nd, the present invention effectively reduces the spilling in data handling system by the judgement to overflow condition:Due to the present invention Step (7) and step (8) are employed, is accounted for according to the EMS memory occupation space of task with the JVM maximum memorys that can distribute to the task Compared with space, wherein JVM can distribute to maximum memory space-consuming i.e. JVM heap internal memory gross space divided by the data processing of task The number of tasks that system is currently run, when the EMS memory occupation space of task takes more than the JVM maximum memorys that can distribute to the task During space, it is believed that task meeting overflow data to disk;Therefore, the present invention can by be prevented effectively from data spill into disk come Disk read-write is reduced, systematic function is improved.
3rd, because according to memory usage, come the influence to consideration task to memory pressure, dispatching method of the invention can be with Expand in different data handling systems, with extensive adaptability;Meanwhile, the calculating of memory usage, which is considered to have, delays Deposit and caching input information and caching output information are contained in the internal memory computing system of technology, mission bit stream, further ensure The scope of application of the dispatching method of the present invention.
4th, because the dispatching method of the present invention can be embedded in original scheduling method, it is possible to and original scheduling method Coexist, every characteristic of original system is not influenceed.Dispatching method is applied to all big data processing systems with hosting environment simultaneously System, implementation method has versatility and transplantability.
Brief description of the drawings
Fig. 1 is the flow chart of the dispatching method of alleviation memory pressure in distributed data processing system of the present invention.
Fig. 2 is the schematic block diagram of the scheduling system of alleviation memory pressure in distributed data processing system of the present invention.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that specific implementation described herein is only to explain the present invention, not For limiting the present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below that Not constituting conflict between this can just be mutually combined.
As shown in figure 1, alleviating the dispatching method of memory pressure in distributed data processing system of the present invention includes following step Suddenly:
(1) all mission bit streams and memory information in data handling system are obtained;
Specifically, mission bit stream is in data handling system, and memory information is while being present in data processing In system and Java Virtual Machine (Java virtual machine, abbreviation JVM);
Mission bit stream has specifically included 4 kinds of input information, i.e. disk/network inputs information, (Shuffle) input letter of shuffling Breath, caching (Cache) input information and joint input information (Cogroup), and 3 kinds of output informations, i.e. disk/network Output information, (Shuffle) output information of shuffling and caching (Cache) output information;Wherein joint input refers to that data can Can be from least two in disk/network inputs, input of shuffling, caching input;
Memory information includes the surplus of EMS memory occupation space, the space-consuming of JVM heap internal memory and the JVM heap internal memory of task Complementary space;
(2) all being currently running for tasks and each the output letter corresponding to task are obtained from data handling system Breath and memory information, and the EMS memory occupation growth of the output information and each task of memory information calculating according to acquisition;
Specifically, if the corresponding output information of a task is disk/network output information, due to the letter of output Breath is temporary object in internal memory, therefore the EMS memory occupation growth of the task is 0;
If the corresponding output information of a task is shuffle output information or caching output information, first at data The EMS memory occupation space in the memory information of the task is obtained in reason system, the last internal memory of the task is then subtracted with the value Space-consuming, so as to obtain EMS memory occupation growth;
(3) each in all tasks is directed to, whether judge its input source is that caching is inputted or input of shuffling, if Caching input or input of shuffling, then be transferred to step (4), be otherwise transferred to step (5);
(4) total size and data total number of the corresponding data set of the task are obtained, obtains processed in the task Data bar number, and corresponding processed size of data is calculated according to the bar number of processed data:(located Bar number/data total number of the data of reason) * data sets total size, be then transferred to step (6);
(5) whether the input source for judging the task is joint input, if it is not, then directly obtain in the task by The size of the data of processing, is then transferred to step (6);If it is data processed in each input source are counted respectively Size (wherein caching input and the statistical method of input of shuffling are described in above-mentioned steps (4), and disk is inputted Statistical method be then to be described in the first half of this step, will not be repeated here), and obtained different numbers will be counted According to size added up, it is obtaining and as processed size of data, be then transferred to step (6);
(6) the EMS memory occupation growth for obtaining correspondence task is increased according to obtained processed size of data and EMS memory occupation Rate:EMS memory occupation increases/processed size of data;
(7) judge whether the space-consuming of the JVM heap internal memory in data handling system in memory information exceedes and overflow threshold value, If it exceeds being then transferred to step (8), step (9) is otherwise transferred to;In the present invention, threshold size is overflowed total equal to JVM heap internal memory 70% to the 90% of size, preferably 80%;
(8) judge whether the EMS memory occupation space of each task takes more than the JVM maximum memorys that can distribute to the task Space (its value is equal to the general assignment number that JVM heap internal memory total size divided by data handling system are currently run), if it does, then The task is hung up, and the task is added to hang-up queue, process terminates, and is otherwise transferred to next task and repeats this step;
(9) judge whether the space-consuming of the JVM heap internal memory in data handling system in memory information is more than or equal to pressure Threshold value, is if it is transferred to step (10), otherwise performs the whole tasks hung up in queue;
In the present invention, pressure threshold size is equal to 20% to the 40% of JVM heap internal memory total size, preferably 30%;
(10) all tasks are subjected to ascending order arrangement according to its EMS memory occupation growth rate and obtain set of tasks, and obtain JVM The remaining space of heap memory;
(11) judge whether the remaining space of JVM heap internal memory is more than 0, selected in all tasks if it is arranged from ascending order First task is taken, the required by task free memory space is subtracted with the remaining space of JVM heap internal memory and obtains new JVM heap internal memory Remaining space, be then transferred to step (12), the wherein required by task free memory space is equal to that (EMS memory occupation of task is empty Between) * (the bar numbers -1 of the data of the total number of task corresponding data/processed), otherwise remaining task is all hung up, And remaining task is added to hang-up queue successively, then it is transferred to step (13);
(12) removed in the set of tasks for arranging current task from ascending order, and return to step (11).
(13) when a task is performed and finished, the tasks carrying positioned at head of the queue is taken out from queue is hung up.
As shown in Fig. 2 the invention provides the scheduling system for alleviating memory pressure in a kind of data handling system, including appoint Business monitoring module, memory analysis module and task scheduling modules.
Mission Monitor module is used for the overall service condition of the current memory source of monitoring system, the execution of operating task Situation and internal memory service condition, and the memory information and mission bit stream of monitoring are fed back into memory analysis module.
Memory analysis module, which is used for analysis task monitoring module, to be fed back the overall memory come and uses data and each task Run time behaviour.During according to each task run to the occupancy rule of internal memory can by task belonging to internal memory carried out using model The classification of coarseness:Constant type, sub-patterned, line style and hyperline type.The result of analysis and internal memory service condition are fed back into task Scheduler module performs scheduling.
The task that task scheduling modules are currently run according to memory pressure Real-Time Scheduling, memory pressure reaches pressure threshold When, the task big to internal memory pressure influence is hung up, such as internal memory is using the task that model is hyperline type, or processing data amount is very Big line style task;When memory pressure reaches spilling threshold value, directly determine whether that task may overflow, hang-up may overflow Going out for task.After the completion of having tasks carrying, or memory pressure is reduced to below pressure threshold, recovers to appoint according to sequencing Business, ward off starvation problem.
Mission Monitor module, for counting three main information:Including current system memory occupation information, system cache The information of data message and current operation task in depositing.The data message of Installed System Memory occupied information and caching is used to analyze interior Pressure is deposited, and the internal memory that the mission bit stream currently run is used for analysis task uses model.
Installed System Memory occupied information refers to the occupied information of the heap memory of data handling system hosting environment.Real-time statistics Data are the space sizes occupied by all survival objects in heap memory, wherein can have the object of a part in rubbish next time Rubbish is just recovered when reclaiming, and these objects are referred to as temporary object.Returned so the Installed System Memory occupied information of statistics is each rubbish The space that object in heap memory takes after receipts, and the object of non real-time monitoring takes up space size.
Data message of the system cache in internal memory refers to the data cached information specifically designed for internal memory computing system.Internal memory Computing system can cache a part of important data object and be used for Accelerated iteration calculating in heap memory, and this segment data object is straight Connect and occupy heap memory space, be also related to task EMS memory occupation model, it is necessary to extra consideration data volume, data record bar Number.
The data message in internal memory is buffered in, only just can be to internal memory formation memory pressure when caching first time.If logical Cross the user program interface of task call to judge memory pressure, it is impossible in view of the information being buffered in internal memory.Therefore can only Judge that internal memory uses model by memory usage.
The information of current operation task, which includes task, needs data volume size to be processed, data record bar number, processed Data volume size, data record bar number, the data of output are stored in internal memory in what manner, and the data volume being stored in internal memory is big Small, data record bar number.
Memory analysis module is three category informations according to Mission Monitor module feedback, judge current memory pressure situation, The internal memory of each task uses model.The judgement of the situation of memory pressure as task scheduling modules scheduling trigger condition, respectively The internal memory of individual task uses model as the foundation for performing scheduling.
The judgement of memory pressure refers to that the data object survived in current system heap memory accounts for the ratio of whole internal memory.In real time The space S that the data object survived in the internal memory of monitoring is occupied1, S2, S3, SnIf the value of certain sampling meets Si< Si+1, then show to have passed through a garbage reclamation activity, judge Si+1Whether whole memory headroom S is reachedtotalPressure threshold δpressure.In addition, setting one to overflow threshold value δspillTo judge whether current task is possible to produce spilling.
The internal memory of each task is that the DLL that task based access control is called is analyzed using model, different DLLs pair The operating feature of data is different, and the data of generation are different to the occupancy feature of internal memory.Task institute is judged by memory usage The internal memory of category is more convenient by DLL using model ratio, it is adaptable to can have been expanded to not while internal memory computing system With the data handling system of DLL.Memory usage is divided into four classes by task coarseness:Constant type, sub-patterned, line style With hyperline type.The influence of line style and hyperline type to memory pressure is bigger than constant type and sub-patterned influence.
DLL (Function API) refers to the data-processing interface that data handling system is provided the user, Yong Hu DLL processing is directly invoked when writing data program, system realizes parallel and distributed treatment automatically.DLL is based on Key-value pair KV is operated, do not differentiate between key-value pair DLL tend not to produce long life cycle object occupy for a long time in Deposit, the DLL for distinguishing key-value pair is then divided into polymerization and non-polymeric two kinds, do not increase interior when key is duplicated in converging operation Occupancy is deposited, rather than converging operation always increases EMS memory occupation.
Memory usage is to be calculated to obtain according to the operation monitoring data of task.In system operation, monitor twice EMS memory occupation increment △ size_memory and the processed data volume of task increment △ size_input ratio △ Size_memory/ △ size_input are the EMS memory occupation growth rate for representing the task this moment.The change of EMS memory occupation growth rate Determine to coarseness the type of the EMS memory occupation pattern of task:The task that EMS memory occupation growth rate is zero is that constant type internal memory makes Use model;EMS memory occupation growth rate is gradually reduced expression task and uses model for sub-patterned internal memory;Memory usage keeps constant, Task is that line style internal memory uses model;EMS memory occupation growth rate gradually increases expression task and uses model for hyperline type internal memory.
Task scheduling modules, are to use mould according to the memory pressure threshold value and the internal memory of task analyzed in memory analysis module Type is scheduled to current task.Memory analysis module gives two threshold value δpressureAnd δspill, task scheduling modules root According to current memory space hold ScurrentWith total memory headroom StotalJudge.Work as Scurrent>StotalspillWhen, represent internal memory Pressure is excessive, it is necessary to prevent task data from spilling into disk;Work as Stotalspill>Scurrent>StotalpressureWhen, represent system There is certain memory pressure, it is necessary to which task, which is scheduled, prevents memory pressure is excessive from having influence on all tasks in system;Work as Scurrent< StotalpressureWhen, represent that memory pressure is moderate, without scheduling.
Prevent task data from spilling into the scheduling of disk, be the meeting based on hosting environment when for thread storage allocation space The space that can use of limitation thread, this can cause thread task upon execution overflow data to disk.When in hosting environment simultaneously When row runs N number of thread task, whole memory headroom StotalIn distribute to the space of single thread in Stotal/ N and Stotal/2N Between.According to the memory headroom S that task is currently used during schedulingcurrentJudge with processed ratio data p, if Scurrent/ (1-p)>Stotal/ N prevents task from overflowing, it is necessary to hang up the task.
The scheduling of memory pressure threshold value is reached, is, when memory pressure is big, model and input to be used according to the internal memory of task Data volume to task carry out in advance sequence (according to the order of constant type, sub-patterned, line style and hyperline type), then dispatch successively. With reference to the scheduling for preventing from overflowing.
The present invention is by all tasks when monitoring in real time and analysis are run in a data processing system to memory pressure Influence, when memory pressure is big, hang-up memory pressure influences big task, after the completion of the tasks carrying small etc. memory pressure influence Recover to influence big task again, slow down the influence that memory pressure is caused, while improving the scalability of service system.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not used to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the invention etc., it all should include Within protection scope of the present invention.

Claims (7)

1. alleviate the dispatching method of memory pressure in a kind of distributed data processing system, it is characterised in that comprise the following steps:
(1) obtain data handling system in all mission bit streams and memory information, wherein mission bit stream include input information and Output information, input information includes disk/network inputs information, input information of shuffling, caching input information and joint input Information, output information includes disk/network output information, output information of shuffling and caching output information, and memory information includes The remaining space in the EMS memory occupation space of task, the space-consuming of JVM heap internal memory and JVM heap internal memory;
(2) obtained from data handling system all being currently running for tasks and output information corresponding to each task and Memory information, and the EMS memory occupation growth of the output information and each task of memory information calculating according to acquisition;
(3) each in all tasks is directed to, whether be caching input or shuffle input, if caching if judging its input source Input or shuffle input, then be transferred to step (4), is otherwise transferred to step (5);
(4) total size and data total number of the corresponding data set of the task are obtained, number processed in the task is obtained According to bar number, and corresponding processed size of data is calculated according to the bar number of processed data, is then transferred to step (6);
(5) whether the input source for judging the task is joint input, if it is not, then directly obtaining processed in the task Data size, be then transferred to step (6);If it is the big of data processed in each input source is counted respectively It is small, and the obtained size of different pieces of information will be counted added up, it is obtaining and as processed size of data, then it is transferred to Step (6);
(6) increased according to obtained processed size of data and EMS memory occupation obtain the EMS memory occupation growth rate of correspondence task= EMS memory occupation increases/processed size of data;
(7) judge whether the space-consuming of the JVM heap internal memory in data handling system in memory information exceedes and overflow threshold value, if More than step (8) is then transferred to, step (9) is otherwise transferred to;
(8) whether judge the EMS memory occupation space of each task can distribute to the maximum memory space-consuming of the task more than JVM, If it does, then hanging up the task, and the task is added to hang-up queue, process terminates, and is otherwise transferred to next task weight Duplicate step;
(9) judge whether the space-consuming of the JVM heap internal memory in data handling system in memory information is more than or equal to pressure threshold, If it is step (10) is transferred to, the whole tasks hung up in queue are otherwise performed;
(10) all tasks are subjected to ascending order arrangement according to its EMS memory occupation growth rate and obtain set of tasks, and obtained in JVM heap The remaining space deposited;
(11) judge whether the remaining space of JVM heap internal memory is more than 0, the is chosen in all tasks if it is arranged from ascending order One task, the required by task free memory space is subtracted with the remaining space of JVM heap internal memory and obtains the surplus of new JVM heap internal memory Complementary space, is then transferred to step (12), otherwise remaining task is all hung up, and remaining task is added to extension successively Queue is played, step (13) is then transferred to;
(12) removed in the set of tasks for arranging current task from ascending order, and return to step (11).
(13) when a task is performed and finished, the tasks carrying positioned at head of the queue is taken out from queue is hung up.
2. dispatching method according to claim 1, it is characterised in that mission bit stream in data handling system, Memory information is while being present in data handling system and Java Virtual Machine.
3. dispatching method according to claim 1, it is characterised in that
If the corresponding output information of a task is disk/network output information, it is 0 that the EMS memory occupation of the task, which increases,;
If the corresponding output information of a task is shuffle output information or caching output information, first from data processing system The EMS memory occupation space in the memory information of the task is obtained in system, the last EMS memory occupation of the task is then subtracted with the value Space, so as to obtain EMS memory occupation growth.
4. dispatching method according to claim 1, it is characterised in that processed size of data is equal to (processed Data bar number/data total number) * data sets total size.
5. dispatching method according to claim 1, it is characterised in that the maximum memory space-consuming of task is equal to JVM heap The general assignment number that internal memory total size divided by data handling system are currently run.
6. dispatching method according to claim 1, it is characterised in that required by task free memory space is equal to (task EMS memory occupation space) * (the bar numbers -1 of the data of the total number of task corresponding data/processed).
7. alleviate the scheduling system of memory pressure in a kind of distributed data processing system, it is characterised in that including:
First module, for obtaining all mission bit streams and memory information in data handling system, wherein mission bit stream includes Input information and output information, input information include disk/network inputs information, input information of shuffling, caching input information, with And joint input information, output information includes disk/network output information, output information of shuffling and caching output information, interior Deposit the remaining space of EMS memory occupation space of the information including task, the space-consuming of JVM heap internal memory and JVM heap internal memory;
Second module, for being obtained from data handling system corresponding to all being currently running for tasks and each task Output information and memory information, and the EMS memory occupation growth of the output information and each task of memory information calculating according to acquisition;
3rd module, for for each in all tasks, judging whether its input source is that caching is inputted or input of shuffling, If caching input or input of shuffling, then be transferred to the 4th module, be otherwise transferred to the 5th module;
4th module, total size and data total number for obtaining the corresponding data set of the task, is obtained in the task The bar number of processed data, and corresponding processed size of data is calculated according to the bar number of processed data, so After be transferred to the 6th module;
5th module, for judging whether the input source of the task is joint input, if it is not, then directly obtaining in the task The size of processed data, is then transferred to the 6th module;If it is count in each input source and located respectively The size of the data of reason, and will count the obtained size of different pieces of information and added up is obtaining and be used as processed data Size, is then transferred to the 6th module;
6th module, the internal memory that correspondence task is obtained for increasing according to obtained processed size of data and EMS memory occupation is accounted for Increase/processed size of data with growth rate=EMS memory occupation;
7th module, overflows for judging whether the space-consuming of the JVM heap internal memory in data handling system in memory information exceedes Go out threshold value, if it exceeds being then transferred to the 8th module, be otherwise transferred to the 9th module;
8th module, for judging whether the EMS memory occupation space of each task can distribute to the most imperial palace of the task more than JVM Space-consuming is deposited, if it does, then hanging up the task, and the task is added to hang-up queue, process terminates, and is otherwise transferred to down One task duplication this step;
9th module, for judging whether the space-consuming of the JVM heap internal memory in data handling system in memory information is more than In pressure threshold, the tenth module is if it is transferred to, the whole tasks hung up in queue are otherwise performed;
Tenth module, obtains set of tasks, and obtain for all tasks to be carried out into ascending order arrangement according to its EMS memory occupation growth rate Take the remaining space of JVM heap internal memory;
11st module, for judging whether the remaining space of JVM heap internal memory is more than 0, if it is from owning that ascending order is arranged First task is chosen in task, the required by task free memory space is subtracted with the remaining space of JVM heap internal memory and obtains new The remaining space of JVM heap internal memory, is then transferred to the 12nd module, otherwise all hangs up remaining task, and by remaining Business is added to hang-up queue successively, is then transferred to the 13rd module;
12nd module, for being removed in the set of tasks that arranges current task from ascending order, and returns to the 11st module;
13rd module, for when a task is performed and finished, the tasks carrying positioned at head of the queue to be taken out from queue is hung up.
CN201710273273.7A 2017-04-25 2017-04-25 Alleviate the dispatching method and system of memory pressure in distributed data processing system Pending CN107066316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710273273.7A CN107066316A (en) 2017-04-25 2017-04-25 Alleviate the dispatching method and system of memory pressure in distributed data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710273273.7A CN107066316A (en) 2017-04-25 2017-04-25 Alleviate the dispatching method and system of memory pressure in distributed data processing system

Publications (1)

Publication Number Publication Date
CN107066316A true CN107066316A (en) 2017-08-18

Family

ID=59605408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710273273.7A Pending CN107066316A (en) 2017-04-25 2017-04-25 Alleviate the dispatching method and system of memory pressure in distributed data processing system

Country Status (1)

Country Link
CN (1) CN107066316A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480071A (en) * 2017-08-25 2017-12-15 深圳大学 Data cached moving method and device
CN111858064A (en) * 2020-07-29 2020-10-30 山东有人信息技术有限公司 Dynamic memory allocation method and system
CN112347052A (en) * 2020-11-04 2021-02-09 深圳集智数字科技有限公司 File matching method and related device
CN114253457A (en) * 2020-09-21 2022-03-29 华为技术有限公司 Memory control method and device
CN114840498A (en) * 2022-07-05 2022-08-02 北京优合融宜科技有限公司 Method and device for realizing memory key value data management based on Java technology
CN116089319A (en) * 2022-08-30 2023-05-09 荣耀终端有限公司 Memory processing method and related device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631730A (en) * 2013-11-01 2014-03-12 深圳清华大学研究院 Caching optimizing method of internal storage calculation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631730A (en) * 2013-11-01 2014-03-12 深圳清华大学研究院 Caching optimizing method of internal storage calculation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XUANHUA SHI等: ""MURS: Mitigating Memory Pressure in Data Processing Systems for Service"", 《ARXIV:1703.08981V1》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480071A (en) * 2017-08-25 2017-12-15 深圳大学 Data cached moving method and device
CN111858064A (en) * 2020-07-29 2020-10-30 山东有人信息技术有限公司 Dynamic memory allocation method and system
CN114253457A (en) * 2020-09-21 2022-03-29 华为技术有限公司 Memory control method and device
CN112347052A (en) * 2020-11-04 2021-02-09 深圳集智数字科技有限公司 File matching method and related device
CN114840498A (en) * 2022-07-05 2022-08-02 北京优合融宜科技有限公司 Method and device for realizing memory key value data management based on Java technology
CN114840498B (en) * 2022-07-05 2022-09-13 北京优合融宜科技有限公司 Method and device for realizing memory key value data management based on Java technology
CN116089319A (en) * 2022-08-30 2023-05-09 荣耀终端有限公司 Memory processing method and related device
CN116089319B (en) * 2022-08-30 2023-10-31 荣耀终端有限公司 Memory processing method and related device

Similar Documents

Publication Publication Date Title
CN107066316A (en) Alleviate the dispatching method and system of memory pressure in distributed data processing system
CN110333937A (en) Task distribution method, device, computer equipment and storage medium
US8868623B2 (en) Enhanced garbage collection in a multi-node environment
CN107168782A (en) A kind of concurrent computational system based on Spark and GPU
CN104301404B (en) A kind of method and device of the adjustment operation system resource based on virtual machine
CN109828833A (en) A kind of queuing system and its method of neural metwork training task
CN108153587B (en) Slow task reason detection method for big data platform
CN102739785B (en) Method for scheduling cloud computing tasks based on network bandwidth estimation
CN104504103A (en) Vehicle track point insert performance optimization method, vehicle track point insert performance optimization system, information collector and database model
CN114675956B (en) Method for configuration and scheduling of Pod between clusters based on Kubernetes
CN108664394A (en) A kind of RAM leakage process tracing method and device
Zhong et al. Speeding up Paulson’s procedure for large-scale problems using parallel computing
CN108196939A (en) For the virtual machine intelligent management and device of cloud computing
CN111881165B (en) Data aggregation method and device and computer readable storage medium
Yang et al. Design of kubernetes scheduling strategy based on LSTM and grey model
US20130238866A1 (en) System and Method for Robust and Efficient Free Chain Management
US7536674B2 (en) Method and system for configuring network processing software to exploit packet flow data locality
CN108920951A (en) A kind of security audit frame based under cloud mode
CN112783892A (en) Chained task execution engine realized through event-driven model
CN106371912A (en) Method and device for resource dispatching of stream-type computation system
Opderbeck et al. The renewal model for program behavior
CN112416539B (en) Multi-task parallel scheduling method for heterogeneous many-core processor
CN108491270A (en) The methods, devices and systems of statistical data
CN117539613B (en) Method for managing shared resource in distributed computing system
CN112449061B (en) Outbound task allocation method and device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170818

WD01 Invention patent application deemed withdrawn after publication