CN115964159A - Big data engine memory management method and device - Google Patents

Big data engine memory management method and device Download PDF

Info

Publication number
CN115964159A
CN115964159A CN202111194693.9A CN202111194693A CN115964159A CN 115964159 A CN115964159 A CN 115964159A CN 202111194693 A CN202111194693 A CN 202111194693A CN 115964159 A CN115964159 A CN 115964159A
Authority
CN
China
Prior art keywords
memory
big data
data engine
heap
segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111194693.9A
Other languages
Chinese (zh)
Inventor
翟舒珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Xiongan ICT Co Ltd
China Mobile System Integration Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Xiongan ICT Co Ltd
China Mobile System Integration Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Xiongan ICT Co Ltd, China Mobile System Integration Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202111194693.9A priority Critical patent/CN115964159A/en
Publication of CN115964159A publication Critical patent/CN115964159A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a big data engine memory management method and a device, wherein the method comprises the following steps: initializing a network manager to enable a network cache pool to generate a preset number of memory segments; creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment; receiving application information of the memory segment, distributing the memory segment based on the application information, and applying for the memory segment to the network cache pool when no available memory segment exists in the cache pool and the application number of the memory segment is less than the application upper limit; and when the application number of the memory segments is greater than the application upper limit or the network cache pool has no available memory segments, stopping the task execution thread. The big data engine memory management method and the big data engine memory management device provided by the embodiment of the invention reduce the Full GC time of the system memory and improve the system operation efficiency.

Description

Big data engine memory management method and device
Technical Field
The invention relates to the technical field of big data storage, in particular to a big data storage method, an engine memory management model, an engine memory management method and a device.
Background
Currently, big data computing engines are mainly implemented by Java or JVM (Java Virtual Machine) -based programming languages, such as Apache Hadoop, apache Spark, apache Drill, apache flush, and the like. The Java language has the advantages that programmers do not need to pay much attention to management of bottom-layer memory resources, because the JVM provides automatic mechanisms for memory opening, destruction, recovery and the like, software development is greatly facilitated, and developers can concentrate on business logic.
However, in a big data application scenario, the data volume processed by a big data computing engine is often huge, and a lot of problems occur when data is cached and efficiently processed in a memory by only relying on the JVM. In big data scenario, the JVM memory management has the following disadvantages: the memory effective utilization rate of the object of the JVM is low; garbage Collection (GC) consumes long time, and the interaction efficiency is seriously influenced; outofmemoryrer errors occur frequently, which causes the JVM to crash, and seriously affects the normal operation of the system, etc.
Disclosure of Invention
The invention provides a method and a device for managing a memory of a big data engine, which are used for solving the problems in the prior art.
The invention provides a big data engine memory management method, which comprises the following steps:
initializing a network manager to enable a network cache pool to generate a preset number of memory segments;
creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment;
receiving application information of the memory segment, distributing the memory segment based on the application information, and applying for the memory segment to the network cache pool when no available memory segment exists in the cache pool and the application number of the memory segment is less than the application upper limit; when the application number of the memory segments is larger than the application upper limit or the network cache pool has no available memory segments, stopping the task execution thread;
and executing the memory management method of the big data engine on a memory management model of the big data engine.
According to the big data engine memory management method provided by the invention, the method further comprises the following steps: and releasing the memory segment back to the buffer pool after the memory segment is consumed.
According to the big data engine memory management method provided by the invention, the method further comprises the following steps: and when the application number of the memory segments is greater than the application upper limit, releasing the memory segments in the buffer pool back to the network buffer pool.
According to the big data engine memory management method provided by the invention, before executing the initialization network manager, the method further comprises the following steps: writing data to be processed into a memory segment, wherein the writing of the data to be processed into the memory segment specifically includes: loading a corresponding sequencer based on the type of the data to be processed; carrying out serialization operation on the data to be processed by utilizing the serializer to obtain binary data; and writing the binary data into the memory segment.
According to the big data engine memory management method provided by the invention, the big data engine memory management model comprises the following steps: and the heap memory area comprises an engine heap upper memory area and a task heap upper memory area.
According to the big data engine memory management method provided by the invention, the big data engine memory management model further comprises: and the off-heap memory area comprises an engine off-heap memory area, a task off-heap memory area, a network buffer memory area, a management memory area, a JVM metaspace and a JVM execution overhead area.
According to the big data engine memory management method provided by the invention, the task heap upper memory area comprises the following steps: the system comprises a network cache pool, a memory resource management pool and a reserved heap memory area.
The invention also provides a big data engine memory management device, comprising:
an initialization module to: initializing a network manager to enable a network cache pool to generate a preset number of memory segments;
a creation module to: creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment;
an assignment module to: receiving the application number of the memory segments, distributing the memory segments based on the application information, and applying for the memory segments to the network cache pool when no available memory segments exist in the cache pool and the application number is smaller than the application upper limit; when the application number is larger than the application upper limit or the network cache pool has no available memory segment, stopping the task execution thread; and executing the memory management method of the big data engine on a memory management model of the big data engine.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of any one of the big data engine memory management methods.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the big data engine memory management method as described in any of the above.
The big data engine memory management method and the device provided by the invention establish the input buffer pool and the output buffer pool based on the task execution thread, utilize the buffer pool to be connected with the network buffer pool, enable the memory section to form a second-level buffer processing, and simultaneously, the life cycle of the input buffer pool and the output buffer pool and the generation and destruction of the task manager are kept synchronous, thereby enabling the memory section in the buffer pool not to be executed with the garbage recovery process. Based on the method, the Full GC (scanning the whole heap memory area and recycling the memory) time of the system memory is reduced, and the system operation efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a big data engine memory management method provided by the present invention;
FIG. 2 is a schematic diagram of data writing to a memory segment according to the present invention;
FIG. 3 is a schematic structural diagram of a big data engine memory management model provided by the present invention;
FIG. 4 is a schematic diagram of memory management in network transmission according to the present invention;
FIG. 5 is a schematic diagram of a big data engine memory management device according to the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a big data engine memory management method provided by the present invention, and as shown in fig. 1, the method includes:
s110, initializing a network manager to enable a network cache pool to generate a preset number of memory segments;
s120, creating a buffer pool based on the task execution thread, and setting an application upper limit of the memory segment;
s130, receiving application information of the memory segment, distributing the memory segment based on the application information, and applying for the memory segment to the network cache pool when no available memory segment exists in the cache pool and the application number of the memory segment is less than the application upper limit; when the application number of the memory segments is larger than the application upper limit or the network cache pool has no available memory segments, stopping the task execution thread;
and executing the memory management method of the big data engine on a memory management model of the big data engine.
It should be noted that, when the task manager is started, the network manager is initialized correspondingly, and the network manager manages the network cache pool, the network cache pool generates a certain number of memory segments under the instruction of the network manager, the default number is 2048, and the number of memory segments represents all available memory in network transmission.
When the task execution thread is started, the thread is registered in the network manager, then the network manager creates a buffer pool based on the task execution thread, the buffer pool is divided into an input buffer pool and an output buffer pool, and simultaneously, application upper limits of the memory segment are respectively set.
In the task execution process, taking task input as an example, when a data receiving end receives data, in order to copy the data into a task, a task input thread applies for a memory segment from an input buffer pool, if the input buffer pool also has no available memory segment and the application number does not reach the upper limit of the input buffer pool, the network buffer pool applies for the memory segment, and the applied memory segment is delivered to the task input thread for filling the data; and when the application number of the memory segments is greater than the application upper limit of the input buffer pool or no available memory segments exist in the network buffer pool, stopping the task execution thread.
The big data engine memory management method provided by the invention creates the input buffer pool and the output buffer pool based on the task execution thread, and utilizes the buffer pool to be connected with the network buffer pool, so that the memory segment forms the second-level buffer processing, and meanwhile, the life cycles of the input buffer pool and the output buffer pool are kept synchronous with the generation and destruction of the task manager, so that the memory segment in the buffer pool can not be executed with the garbage recovery process. Based on the method, the Full GC (scanning the whole heap memory area and recycling the memory) time of the system memory is reduced, and the system operation efficiency is improved.
According to the big data engine memory management method provided by the invention, in the invention, the method further comprises the following steps: releasing the memory segment back to the buffer pool after the memory segment is consumed; and when the application number of the memory segments is greater than the application upper limit, releasing the memory segments in the buffer pool back to the network buffer pool.
It should be noted that, after the memory segment is consumed, that is, after the memory segment is used to execute and complete the task input or task output action, the memory segment is released back to the buffer pool, so that the memory segment can be used in the next task execution process again; when the number of the memory segments is larger than the application upper limit, it is indicated that the demand for the memory segments exceeds the application upper limit in the input buffer pool or the output buffer pool, and at this time, the memory segments need to be applied again to the network buffer pool, so that the memory segments in the buffer pool are released back to the network buffer pool in order to ensure that the number of the memory segments in the network buffer pool is sufficient.
The method for managing the memory of the big data engine provided by the invention has the advantages that the memory segments are released back to the buffer pool by setting the release and recovery mechanism of the memory segments, namely when the memory segment consumption is finished, the memory segments in the buffer pool are released back to the network buffer pool when the application number of the memory segments is more than the application upper limit, and based on the processes, the cyclic utilization and the reasonable allocation of the memory segments are realized, and the reuse rate and the utilization rate of the memory segments are improved.
According to the big data engine memory management method provided by the invention, before executing the initialization network manager, the method further comprises the following steps: writing data to be processed into a memory segment, wherein the writing of the data to be processed into the memory segment specifically includes:
loading a corresponding sequencer based on the type of the data to be processed;
carrying out serialization operation on the data to be processed by utilizing the serializer to obtain binary data;
and writing the binary data into the memory segment.
It should be noted that, serializing data refers to a process of converting a data structure and an object into a binary system, and different serializing converters may correspond to different data types, which specifically includes the following steps: any of the java base type (boxed) or String type; arrays (boxed) or String arrays of any Java base type; the implementation class of any Hadoop Writable interface; arbitrary Tuple type (support Tuple1 to Tuple 25); any Scala CaseClass type and any POJO type.
And after the data to be processed is converted into binary data by using the corresponding serializers, writing the obtained binary data into the memory segment for storage. In the invention, the memory segment is used as the minimum memory allocation unit of the big data engine, and the default size is 32KB; in order to facilitate management of the memory segments, a concept of a memory page is introduced, the memory page represents a data access view on the memory segment, data is written and abstracted into a data output view, and correspondingly, data reading is abstracted into a data input view. The memory segments and the memory pages are managed by a memory manager.
Before writing data into a memory segment, a data structure or an object needs to be serialized, which is a process of converting the data structure and the object into binary data. Because the type of the data set object is fixed, only one piece of Schema information can be stored for the data set, and the fixed size type can be accessed through a fixed offset position, so that a large amount of storage space can be saved. Moreover, when a member variable needs to be accessed, the whole Java object does not need to be deserialized, and only a specific member variable can be deserialized directly through an offset. For objects with more member variables, the creating overhead of Java objects and the size of memory data copy can be greatly reduced.
According to the big data engine memory management method provided by the invention, the data are stored in the memory segment through serialization processing, so that the space consumption is reduced, and the storage density of the object is improved. Meanwhile, the ordered management of writing and reading of the memory segments is realized through the memory pages, so that the data access efficiency is improved, and the creating overhead of Java objects and the size of memory data copy are reduced.
According to the big data engine memory management method provided by the invention, in the invention, the big data engine memory management model comprises: and the heap memory area comprises an engine heap upper memory area and a task heap upper memory area.
It should be noted that the on-engine-heap memory area is used for providing a memory used for running the big data computing engine framework, and the on-task-heap memory area is used for providing a heap memory used for executing the user task code. In the invention, the heap memory area is further divided into the sections in a refined manner according to actual requirements, so that the pre-allocation of the heap memory area is realized, and the use of the memory area is planned in advance.
According to the big data engine memory management method provided by the invention, the heap memory area is divided into the engine on-heap memory area and the task on-heap memory area, so that the pre-allocation of the heap memory area is realized, the memory use is ensured to be planned, the memory conflict is avoided, and the memory of the heap memory area can exert the storage capacity to the maximum extent.
According to the big data engine memory management method provided by the invention, in the invention, the big data engine memory management model further comprises: and the off-heap memory area comprises an engine off-heap memory area, a task off-heap memory area, a network buffer memory area, a management memory area, a JVM metaspace and a JVM execution overhead area.
It should be noted that the engine out-of-heap memory area is used for providing an out-of-heap memory used by the big data engine framework, the task out-of-heap memory area is used for providing an out-of-heap memory used for executing a user task code, the network buffer memory area is used for providing the out-of-heap memory size used by network exchange so as to facilitate network data exchange, and the management memory area is used for providing an out-of-heap memory managed by the big data engine, and can be used for sorting, hash tables, caching intermediate results and the like; memory used by the JVM itself: the JVM metaspace and the JVM execute the overhead area, and the memory required by the JVM itself during the execution of the JVM includes the memory used by the thread stack, the IO, the compile cache, and the like.
The big data engine memory management method provided by the invention divides the heap area into the heap memory area and the out-of-heap memory area, and particularly divides the out-of-heap memory area into the engine out-of-heap memory area, the task out-of-heap memory area, the network buffer memory area, the management memory area, the JVM metaspace and the JVM execution overhead area, thereby greatly reducing the heap memory area and further reducing the start time of the JVM; meanwhile, when the JVM fails, only the heap memory area is influenced by the special memory management model, and the external memory area is not influenced and cannot lose data, so that the JVM can be used for fault recovery. In sum, the setting of the external memory area can greatly reduce the consumption of garbage collection in a big data scene, and improve the utilization efficiency of the memory.
According to the big data engine memory management method provided by the invention, in the invention, the task heap upper memory area comprises: the system comprises a network cache pool, a memory resource management pool and a reserved heap memory area.
It should be noted that, when the task process is started, the memory area on the task heap is divided into three parts, which are a network cache pool, a memory resource management pool, and a reserved heap memory area, respectively, where:
network cache pool: the Buffer composed of a certain number of 32KB memory segments is mainly used for network transmission, and the task manager allocates memory segments to the Buffer when starting, wherein the default number of the memory segments is 2048.
A memory resource management pool: the memory resource management pool is managed by a memory manager. Applying for a memory segment from the memory in operators of the big data engine such as sort, shuffle and join, then storing the serialized data in the memory segment, and releasing the memory back to the memory pool after the memory segment is used up.
Reserving a heap memory area: the memory in the reserved heap memory area is reserved for the user code and the data structures run by the task manager itself, which are generally small enough for the user code, and from the viewpoint of JVM garbage collection, they belong to new generations, that is, they are short objects generated by the user code.
The big data engine memory management method provided by the invention realizes the further division of the memory area on the task heap by specifically dividing the memory on the task heap into the network cache pool, the memory resource management pool and the reserved heap memory area when the task process is started, provides a structural basis for the allocation of the memory section in the task process, and ensures the smooth operation of the memory section management by using the network cache pool-buffer pool management mechanism in the big data engine memory management method.
Fig. 2 is a schematic diagram of data writing of a memory segment provided in the present invention, and as shown in fig. 2, the data writing process of the memory segment is as follows:
the Tuple (Tuple) shown in the figure is composed of three elements, wherein the three elements are integer int, double and object type (Person), the Person object comprises a shaping variable and a character string variable, the variables are serialized into binary data according to different data types, and then the binary data are written into a memory segment. The method comprises the following specific steps: the integer int is serialized by an integer sequencer to obtain 4-byte binary data, the double-precision sequencer is used for serializing double to obtain 8-byte binary data, the object type sequencer is used for serializing Person to obtain 1-byte binary data, the integer sequencer is used for serializing int to obtain 4-byte binary data, and the character string sequencer is used for serializing character strings to obtain variable-length byte serialized data.
Aggregation (group), sorting (sort), association (join), etc. operations are often performed in big data engine calculations, which require access to massive amounts of data, and the following description will take sorting as an example to illustrate how operations on binary data are performed.
First, the big data engine will apply for a batch of memory segments from the memory manager, and these memory segments will be used as the sorted cache for storing the sorted data. The sequencing cache is divided into two regions: one area is used for storing the complete binary data of all the objects. The other block is used for storing a pointer of the binary data and a serialized sorting key of a fixed-length data type. If the serialized sort key is of a variable length type, such as String, its prefix is taken for serialization.
The key words of the actual data and the pointer plus fixed length variable are stored separately, so that the method has the following advantages: (1) fixed-length data blocks are exchanged more efficiently without exchanging real data or moving other sort keys and pointers. (2) All the sorting keys are continuously stored in the memory, so that the problem of cache hit can be greatly reduced.
The key of the sorting is comparing the size and exchanging, firstly, the size is compared by using the sorting key words, thus, the binary sorting key words can be directly used for comparison without deserializing the whole object. Because the sort key is of a fixed length, if the keys are the same, the actual binary data must be deserialized and compared. After the sizes are compared, the sorting effect can be achieved only by exchanging the sorting keywords and the pointers, and real data do not need to be moved.
According to the big data engine memory management method, different serializers are selected based on different data types, and the corresponding serializers are used for executing the serializing operation on the data to obtain the corresponding binary data.
Fig. 3 is a schematic structural diagram of a big data engine memory management model provided in the present invention, and as shown in fig. 3, the big data engine memory management model includes: the system comprises a heap memory area and an out-of-heap memory area, wherein the heap memory area comprises an engine on-heap memory area and a task on-heap memory area, and the out-of-heap memory area comprises an engine out-of-heap memory area, a task out-of-heap memory area, a network buffer memory area, a management memory area, a JVM meta space and a JVM execution overhead area.
It should be noted that, in the present invention, the big data engine memory management model is directed to the partition of the heap area on the memory space of the JVM, in the memory managed by the JVM, the heap area is the largest block, the heap area is also the main memory area managed by the Java GC (GC) mechanism, the heap area is shared by all threads, and is created when the virtual machine is started, and the heap area is used to store the object instance and the array value, and it can be considered that all objects created by new in Java are allocated here.
According to the big data engine memory management method provided by the invention, the heap memory area is divided into the engine on-heap memory area and the task on-heap memory area, so that pre-allocation of the heap memory area is realized, the memory usage is ensured to be planned, and memory conflict is avoided. Meanwhile, an out-of-heap memory area is introduced into the memory management model, and an engine out-of-heap memory area, a task out-of-heap memory area, a network buffer memory area, a management memory area and the like are divided in the out-of-heap memory area. The heap memory area can be greatly reduced by introducing the out-of-heap memory area, so that the start time of the JVM is reduced. The off-heap memory area is zero-copy when the skew disk or network transmits, and the heap memory area needs copy at least once. When the JVM fails, only the heap memory area is affected, and the out-of-heap memory area cannot lose data and can be used for failure recovery. In conclusion, the setting of the external memory area can greatly reduce the garbage recovery consumption in a big data scene, and improve the memory utilization efficiency.
Fig. 4 is a schematic diagram of memory management in network transmission provided by the present invention, as shown in fig. 4, data is transmitted between task operators of a big data engine on a network layer, a memory Buffer is used, application and release of the memory Buffer are both managed by the big data engine itself, the memory Buffer is in units of memory segments, and the big data engine manages the Buffer by using a Buffer resource pool, including notification of the application, release, destruction, and available Buffer of the Buffer. Each Task has its own Buffer resource pool, and the big data engine must rely on the big data cluster, and in the running process of the big data Task, frequent data exchange will be performed between each node of the cluster, and the black rectangle in fig. 6 represents the memory segment, and the Buffer is the encapsulation of the memory segment. Each task contains an input and an output, and the data of the input and the output are stored in a Buffer.
When the task manager is started, the network manager is initialized, and all network-related things in the task manager are managed by the network manager, wherein the network manager comprises a network cache pool, the network cache pool generates a certain number (2048 by default) of memory segments, and the number of the memory segments represents all available memory in network transmission.
When the task execution thread is started, the task execution thread registers to a network manager, the network manager respectively creates a buffer pool for input and output, and sets the number of memory segments which can be applied. The memory of the buffer pool is dynamically allocated, when the buffer pool is created or destroyed, the network buffer pool calculates the number of memory blocks of the idle memory and averagely allocates the memory blocks to the buffer pool created by the network manager, so that the buffer pool can utilize the memory as much as possible, and the system can more easily cope with the instant pressure because the more the memory is.
In the task execution process, when a receiving end receives data, in order to copy the data into a task, a task input thread applies for a memory from a corresponding buffer pool, if no memory segment is available in the buffer pool and the application number does not reach the upper limit of the buffer pool, the memory segment is applied to a network buffer pool, and the applied memory segment is handed to the task input thread or the task output thread for filling the data. If the buffer pool has been applied to reach the upper limit or the network buffer pool has no available memory segment, the task is suspended to be read at this time, the upstream sending end immediately responds to stop sending, and the topology enters a back pressure state. And when one memory segment is consumed, releasing the memory segment to the buffer pool, and if the number currently applied by the buffer pool exceeds the capacity of the buffer pool, recycling the memory segment to the network buffer pool by the buffer pool.
The big data engine memory management method provided by the invention creates an input buffer pool and an output buffer pool based on a task execution thread, and utilizes the buffer pool to be connected with a network buffer pool, so that a memory segment forms a second-level buffer processing, and meanwhile, the life cycles of the input buffer pool and the output buffer pool are kept synchronous with the generation and destruction of a task manager, so that the memory segment in the buffer pool cannot be executed with a garbage recovery process. Based on the method, the Full GC time of the system memory is reduced, and the system operation pressure is relieved.
Fig. 5 is a schematic structural diagram of a big data engine memory management device provided in the present invention, and as shown in fig. 5, the device includes:
an initialization module 510 configured to: initializing a network manager to enable a network cache pool to generate a preset number of memory segments;
a creation module 520 to: creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment;
an assignment module 530 to: receiving the application number of the memory segments, distributing the memory segments based on the application information, and applying for the memory segments to the network cache pool when no available memory segments exist in the cache pool and the application number is smaller than the application upper limit; stopping the task execution thread when the application number is larger than the application upper limit or the network cache pool has no available memory segment; and executing the memory management method of the big data engine on a memory management model of the big data engine.
The big data engine memory management device provided by the invention creates the input buffer pool and the output buffer pool based on the task execution thread, and utilizes the buffer pool to be connected with the network buffer pool, so that the memory section forms the second-level buffer processing, and meanwhile, the life cycles of the input buffer pool and the output buffer pool are kept synchronous with the generation and destruction of the task manager, so that the memory section in the buffer pool cannot be executed with the garbage recovery process. Based on the method, the Full GC (scanning the whole heap memory area and recycling the memory) time of the system memory is reduced, and the system operation efficiency is improved.
According to the big data engine memory management device provided by the present invention, in the present invention, the device 500 further comprises a releasing module 540, and the releasing module 540 is configured to release the memory segment back to the buffer pool after the memory segment is consumed.
The big data engine memory management device provided by the invention releases the memory segment back to the buffer pool by setting the release and recovery mechanism of the memory segment, namely when the memory segment is consumed, based on the above processes, realizes the cyclic utilization and reasonable allocation of the memory segment, and improves the reuse rate and the utilization rate of the memory segment.
According to the device for managing the memory of the big data engine provided by the present invention, in the present invention, the device 500 further comprises a determining module 550, and the determining module 550 is configured to release the memory segment in the buffer pool back to the network buffer pool when the application number of the memory segment is greater than the application upper limit.
According to the big data engine memory management device provided by the invention, by setting the release and recovery mechanism of the memory segments, when the application number of the memory segments is greater than the application upper limit, the memory segments in the buffer pool are released back to the network buffer pool, and based on the above processes, the cyclic utilization and reasonable allocation of the memory segments are realized, and the reuse rate and the utilization rate of the memory segments are improved.
According to the big data engine memory management device provided by the present invention, in the present invention, the device 500 further includes a writing module 560, and the writing module 560 is configured to, before executing the initializing network manager, further include: writing the data to be processed into the memory segment, wherein the writing of the data to be processed into the memory segment is specifically configured to: loading a corresponding sequencer based on the type of the data to be processed; carrying out serialization operation on the data to be processed by utilizing the serializer to obtain binary data; and writing the binary data into a memory segment.
According to the big data engine memory management device provided by the invention, the data are stored in the memory segment through serialization processing, so that the space consumption is reduced, and the storage density of the object is improved. Meanwhile, the memory pages are used for realizing the ordered management of writing and reading of the memory segments, so that the data access efficiency is improved, and the creation overhead of Java objects and the size of memory data copy are reduced.
According to the present invention, the apparatus 500 operates on a big data engine memory management model, and the big data engine memory management model includes: and the heap memory area comprises an engine heap upper memory area and a task heap upper memory area.
The big data engine memory management device provided by the invention divides the heap memory area into the engine on-heap memory area and the task on-heap memory area, realizes the pre-allocation of the heap memory area, ensures that the memory use is planned, avoids the memory conflict, and enables the memory of the heap memory area to exert the storage capacity to the maximum extent.
According to the big data engine memory management device provided by the present invention, in the present invention, the device 500 operates on a big data engine memory management model, and the big data engine memory management model further includes: the off-heap memory area comprises an engine off-heap memory area, a task off-heap memory area, a network buffer memory area, a management memory area, a JVM metaspace and a JVM execution overhead area.
The big data engine memory management device provided by the invention divides the heap area into the heap memory area and the out-of-heap memory area, and particularly divides the out-of-heap memory area into the engine out-of-heap memory area, the task out-of-heap memory area, the network buffer memory area, the management memory area, the JVM meta space and the JVM execution overhead area, so that the heap memory area is greatly reduced, and the start time of the JVM is reduced; meanwhile, when the JVM fails, only the heap memory area is influenced by the special memory management model, and the external memory area is not influenced and cannot lose data, so that the JVM can be used for fault recovery. In sum, the setting of the external memory area can greatly reduce the consumption of garbage collection in a big data scene, and improve the utilization efficiency of the memory.
According to the big data engine memory management device provided by the present invention, in the present invention, the device 500 runs on a big data engine memory management model, and the task heap memory area on the big data engine memory management model includes: the system comprises a network cache pool, a memory resource management pool and a reserved heap memory area.
The big data engine memory management device provided by the invention realizes the further division of the memory area on the task stack by specifically dividing the memory on the task stack into the network cache pool, the memory resource management pool and the reserved stack memory area when the task process is started, provides a structural basis for the allocation of the memory section in the task process, and ensures the smooth operation of the memory section management by using the network cache pool-buffer pool management mechanism in the big data engine memory management method.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor) 610, a communication Interface 620, a memory (memory) 630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 complete communication with each other through the communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform a big data engine memory management method comprising: initializing a network manager to enable a network cache pool to generate a preset number of memory segments; creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment; receiving application information of the memory segments, distributing the memory segments based on the application information, and applying for the memory segments to the network cache pool when no available memory segments exist in the cache pool and the application number of the memory segments is smaller than the application upper limit; stopping the task execution thread when the application number of the memory segments is larger than the application upper limit or the network cache pool has no available memory segments; and executing the memory management method of the big data engine on a memory management model of the big data engine.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the big data engine memory management method provided by the above methods, the method including: initializing a network manager to enable a network cache pool to generate a preset number of memory segments; creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment; receiving application information of the memory segment, distributing the memory segment based on the application information, and applying for the memory segment to the network cache pool when no available memory segment exists in the cache pool and the application number of the memory segment is less than the application upper limit; stopping the task execution thread when the application number of the memory segments is larger than the application upper limit or the network cache pool has no available memory segments; and executing the memory management method of the big data engine on a memory management model of the big data engine.
In another aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the big data engine memory management method provided in the above aspects, the method including: initializing a network manager to enable a network cache pool to generate a preset number of memory segments; creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment; receiving application information of the memory segment, distributing the memory segment based on the application information, and applying for the memory segment to the network cache pool when no available memory segment exists in the cache pool and the application number of the memory segment is less than the application upper limit; stopping the task execution thread when the application number of the memory segments is larger than the application upper limit or the network cache pool has no available memory segments; and executing the memory management method of the big data engine on a memory management model of the big data engine.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A big data engine memory management method is characterized by comprising the following steps:
initializing a network manager to enable a network cache pool to generate a preset number of memory segments;
creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment;
receiving application information of the memory segments, distributing the memory segments based on the application information, and applying for the memory segments to the network cache pool when no available memory segments exist in the cache pool and the application number of the memory segments is smaller than the application upper limit; when the application number of the memory segments is larger than the application upper limit or the network cache pool has no available memory segments, stopping the task execution thread;
and executing the memory management method of the big data engine on a memory management model of the big data engine.
2. The big data engine memory management method of claim 1, further comprising: and releasing the memory segment back to the buffer pool after the memory segment is consumed.
3. The big data engine memory management method according to claim 1, further comprising: and when the application number of the memory segments is greater than the application upper limit, releasing the memory segments in the buffer pool back to the network buffer pool.
4. The big data engine memory management method according to claim 1, further comprising, before executing the initializing network manager: writing data to be processed into a memory segment, wherein the writing of the data to be processed into the memory segment specifically comprises:
loading a corresponding sequencer based on the type of the data to be processed;
carrying out serialization operation on the data to be processed by utilizing the serializer to obtain binary data;
and writing the binary data into a memory segment.
5. The big data engine memory management method of claim 1, wherein the big data engine memory management model comprises: and the heap memory area comprises an engine heap upper memory area and a task heap upper memory area.
6. The big data engine memory management method of claim 5, wherein the big data engine memory management model further comprises: and the off-heap memory area comprises an engine off-heap memory area, a task off-heap memory area, a network buffer memory area, a management memory area, a JVM metaspace and a JVM execution overhead area.
7. The big data engine memory management method of claim 6, wherein the task heap upper memory area comprises: the system comprises a network cache pool, a memory resource management pool and a reserved heap memory area.
8. A big data engine memory management device, comprising:
an initialization module to: initializing a network manager to enable a network cache pool to generate a preset number of memory segments;
a creation module to: creating a buffer pool based on a task execution thread, and setting an application upper limit of the memory segment;
an assignment module to: receiving the application number of the memory segments, distributing the memory segments based on the application information, and applying for the memory segments to the network cache pool when no available memory segments exist in the buffer pool and the application number is smaller than the application upper limit; stopping the task execution thread when the application number is larger than the application upper limit or the network cache pool has no available memory segment; and executing the memory management method of the big data engine on a memory management model of the big data engine.
9. An electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, wherein said processor when executing said program performs the steps of the big data engine memory management method according to any of claims 1 to 7.
10. A non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the big data engine memory management method according to any of claims 1 to 7.
CN202111194693.9A 2021-10-13 2021-10-13 Big data engine memory management method and device Pending CN115964159A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111194693.9A CN115964159A (en) 2021-10-13 2021-10-13 Big data engine memory management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111194693.9A CN115964159A (en) 2021-10-13 2021-10-13 Big data engine memory management method and device

Publications (1)

Publication Number Publication Date
CN115964159A true CN115964159A (en) 2023-04-14

Family

ID=87358524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111194693.9A Pending CN115964159A (en) 2021-10-13 2021-10-13 Big data engine memory management method and device

Country Status (1)

Country Link
CN (1) CN115964159A (en)

Similar Documents

Publication Publication Date Title
US11010681B2 (en) Distributed computing system, and data transmission method and apparatus in distributed computing system
Chen et al. GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data
Wang et al. Panthera: Holistic memory management for big data processing over hybrid memories
US9817580B2 (en) Secure migratable architecture having improved performance features
WO2017114283A1 (en) Method and apparatus for processing read/write request in physical host
US8510710B2 (en) System and method of using pooled thread-local character arrays
US9448857B2 (en) Memory access method for parallel computing
CN104731569A (en) Data processing method and relevant equipment
CN109324893B (en) Method and device for allocating memory
US20230297352A1 (en) Method for Starting Serverless Container and Related Device
Tang et al. AEML: an acceleration engine for multi-GPU load-balancing in distributed heterogeneous environment
CN110580195A (en) Memory allocation method and device based on memory hot plug
US11403213B2 (en) Reducing fragmentation of computer memory
US11436141B2 (en) Free memory page hinting by virtual machines
CN115964159A (en) Big data engine memory management method and device
KR101085763B1 (en) Memory allocation in a multi-processor system
JP2003140912A (en) Data processing system, data processing method, and storage medium
Khrabrov et al. Accelerating complex data transfer for cluster computing
Li et al. Userland CO-PAGER: boosting data-intensive applications with non-volatile memory, userspace paging
Asai et al. Transparent avoidance of redundant data transfer on GPU-enabled apache spark
Inoue et al. A study of memory management for web-based applications on multicore processors
Plauth et al. Improving the accessibility of NUMA‐aware C++ application development based on the PGASUS framework
US20130346975A1 (en) Memory management method, information processing device, and computer-readable recording medium having stored therein memory management program
Zhang et al. An incremental iterative acceleration architecture in distributed heterogeneous environments with GPUs for deep learning
Kaur et al. Omr: Out-of-core mapreduce for large data sets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination