CN116700995B

CN116700995B - Concurrent access method, device, equipment and storage medium for heterogeneous memory pool

Info

Publication number: CN116700995B
Application number: CN202310967987.3A
Authority: CN
Inventors: 赵雅倩; 高开; 郭振华; 王丽; 曹芳
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2023-08-03
Filing date: 2023-08-03
Publication date: 2023-11-03
Anticipated expiration: 2043-08-03
Also published as: CN116700995A

Abstract

The invention discloses a concurrent access method, device and equipment for a heterogeneous memory pool and a storage medium, and relates to the technical field of memory access. The method comprises the following steps: acquiring input data related to a data processing model, and splitting the input data to obtain corresponding data blocks; storing the model parameters related to the data processing model into a dynamic random access memory, and storing the data block into a persistent memory; constructing a two-layer index according to each input data and the data block corresponding to the input data in the persistent memory; and acquiring read requests of at least two data processing tasks, and accessing the persistent memory in parallel according to the read requests and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters. The heterogeneous memory can be efficiently utilized, and the concurrent memory access efficiency of the memory is improved.

Description

Concurrent access method, device, equipment and storage medium for heterogeneous memory pool

Technical Field

The present invention relates to the field of memory access technologies, and in particular, to a method, an apparatus, a device, and a storage medium for concurrent access of heterogeneous memory pools.

Background

The conventional memory is composed of a single dynamic random access memory (DRAM, dynamicRandomAccessMemory), but with the rapid development of advanced memory technologies in recent years, heterogeneous memory architectures based on multiple memory technologies, such as heterogeneous memory architectures composed of dynamic random access memory and nonvolatile memory, are emerging. The artificial intelligence data processing model generally needs to use a memory pool to store a data set initially required by an application and a generated temporary result, and how to better use heterogeneous memory in a data processing process is a problem to be solved at present.

Disclosure of Invention

Accordingly, the present invention aims to provide a concurrent access method, apparatus, device and storage medium for heterogeneous memory pool, which can efficiently utilize heterogeneous memory and improve concurrent access efficiency of memory. The specific scheme is as follows:

in a first aspect, the invention discloses a concurrent access method for heterogeneous memory pools, which comprises the following steps:

acquiring input data related to a data processing model, and splitting the input data to obtain corresponding data blocks;

Storing the model parameters related to the data processing model into a dynamic random access memory, and storing the data block into a persistent memory;

constructing a two-layer index according to each input data and the data block corresponding to the input data in the persistent memory;

and acquiring read requests of at least two data processing tasks, and accessing the persistent memory in parallel according to the read requests and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters.

Optionally, the splitting the input data to obtain a corresponding data block includes:

splitting the input data according to the size parameters respectively corresponding to the input data and the model parameters and a preset splitting rule to obtain a data block with the size larger than the model parameters and proportional to the size of the model parameters.

Optionally, splitting the input data according to a preset splitting rule according to the size parameters corresponding to the input data and the model parameters, including:

Calculating a first ratio of the width of the input data to the width of the model parameter, and taking the product of the first ratio and the first ratio parameter as the width of the data block;

calculating a second ratio of the length of the input data to the length of the model parameter, and taking the product of the second ratio and the second ratio parameter as the length of the data block;

splitting the input data according to the length and the width corresponding to the data block.

And optionally splitting the input data so that two adjacent split data blocks contain partial repeated data.

Optionally, the acquiring the repetition width includes:

and determining the repetition width according to the characteristic of the convolution operation.

Optionally, the determining the repetition width according to the characteristic of the convolution operation includes:

obtaining a step length corresponding to convolution operation, and calculating a difference value between the width of the model parameter and the step length;

comparing the difference value with the value of the step length, and taking the larger value as the repetition width.

Optionally, the reading the target data block from the persistent memory to the dynamic random access memory includes:

and reading a target data block from the persistent memory to the dynamic random access memory according to the size of the persistent memory and the size of the dynamic random access memory, and the memory occupation size of the input data and the memory occupation size of the model parameter.

Optionally, before the target data block is read from the persistent memory to the dynamic random access memory according to the size of the persistent memory and the size of the dynamic random access memory, and the memory occupation size of the input data and the memory occupation size of the model parameter, the method further includes:

determining the size of the heterogeneous memory in the heterogeneous memory pool; the heterogeneous memory comprises a persistent memory and a dynamic random access memory;

and determining the memory occupation size of the input data and the memory occupation size of the model parameters.

Optionally, the determining the memory occupancy size of the input data and the memory occupancy size of the model parameter includes:

and respectively calculating the memory occupation sizes of the input data and the model parameters according to the size parameters and the target data types respectively corresponding to the input data and the model parameters.

Optionally, the size parameters include length, width, and channel number; the target data type is floating point type.

Optionally, the calculating the memory occupation sizes of the input data and the model parameters respectively includes:

calculating the product of the length, width, channel number and bit number of the target data type corresponding to the input data to obtain the memory occupation size of the input data;

And calculating the product of the length, the width, the channel number and the bit number of the target data type corresponding to the model parameters to obtain the memory occupation size of the model parameters.

Optionally, the accessing the persistent memory in parallel according to the read request and the two-layer index to read the target data block from the persistent memory to the dynamic random access memory includes:

accessing the persistent memory in parallel according to the read request and the two-layer index so as to read a target data block from the persistent memory to temporarily store the target data block into a first cache queue;

and reading the target data block from the first cache queue to the dynamic random access memory in sequence, and carrying out data parallel processing by combining the model parameters.

Optionally, the reading the target data block from the first cache queue sequentially to the dynamic random access memory, and performing data parallel processing in combination with the model parameter includes:

temporarily storing the processing result output by each data processing task into a second buffer queue;

and splicing the processing results output by the single data processing task in the second buffer queue according to the sequence to obtain a complete merging result corresponding to each data processing task.

Optionally, after the processing results output by the single data processing task in the second buffer queue are spliced in order to obtain a complete merging result corresponding to each data processing task, the method further includes:

and reading the complete merging result from a second cache queue positioned in the dynamic random access memory to the persistent memory for storage.

Optionally, the constructing a two-layer index according to each input data in the persistent memory and the data block corresponding to the input data includes:

constructing a first-layer index according to each input data in the persistent memory;

and constructing a second layer index according to the data block corresponding to the input data on the basis of the first layer index.

Optionally, the constructing a first layer index according to each input data in the persistent memory includes:

and constructing an unordered index according to each input data in the persistent memory, and taking the unordered index as the first-layer index.

Optionally, the constructing an unordered index according to each input data in the persistent memory includes:

and constructing a disorder index based on a hash index mode and according to each input data in the persistent memory.

Optionally, the constructing a second layer index according to the data block corresponding to the input data based on the first layer index includes:

on the basis of the first layer index, an ordered index is constructed according to the data blocks corresponding to each piece of input data and the sequence of the data blocks;

the ordered index is used as the second layer index to obtain a mixed index for the input data.

Optionally, the constructing an ordered index according to the data block corresponding to each input data and the order of the data blocks includes:

and constructing unordered indexes based on a B+ tree index mode and according to the data blocks corresponding to the input data and the sequence of the data blocks.

In a second aspect, the present invention discloses a heterogeneous memory pool concurrent access device, including:

the data block splitting module is used for acquiring input data related to the data processing model and splitting the input data to obtain corresponding data blocks;

the storage module is used for storing the model parameters related to the data processing model into a dynamic random access memory and storing the data block into a persistent memory;

the index construction module is used for constructing a two-layer index according to each input data and the data block corresponding to the input data in the persistent memory;

And the memory access module is used for acquiring read requests of at least two data processing tasks, accessing the persistent memory in parallel according to the read requests and the two-layer index, and reading a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters.

In a third aspect, the present invention discloses an electronic device, comprising:

a memory for storing a computer program;

and the processor is used for executing the computer program to realize the heterogeneous memory pool concurrent access method.

In a fourth aspect, the present invention discloses a computer-readable storage medium for storing a computer program; the concurrent access method of the heterogeneous memory pool is realized when the computer program is executed by the processor.

In the invention, input data related to a data processing model is obtained, and the input data is split to obtain corresponding data blocks; storing the model parameters related to the data processing model into a dynamic random access memory, and storing the data block into a persistent memory; constructing a two-layer index according to each input data and the data block corresponding to the input data in the persistent memory; and acquiring read requests of at least two data processing tasks, and accessing the persistent memory in parallel according to the read requests and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters.

Therefore, in this embodiment, by splitting input data, storing the input data into the persistent memory according to the characteristics of the input data and the model parameters, and storing the model parameters into the dynamic random access memory, the heterogeneous memory can be fully utilized, and a two-layer index is built for the data blocks in the persistent memory, so that the target data blocks are read from the persistent memory to the dynamic random access memory in parallel for processing through the two-layer index during data processing, the data indexing efficiency is improved, the heterogeneous memory can be efficiently utilized, the concurrent access efficiency of the memory is improved, and the performance of the whole data processing is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only embodiments of the present invention, and other drawings may be obtained according to the provided drawings without inventive effort for those skilled in the art.

FIG. 1 is a flow chart of a concurrent access method for heterogeneous memory pools provided by the invention;

FIG. 2 is a schematic diagram of a specific multi-task parallel access processing flow provided by the present invention;

FIG. 3 is a schematic diagram illustrating the division and merging of input data according to one embodiment of the present invention;

FIG. 4 is a flowchart of a concurrent access method for a specific heterogeneous memory pool according to the present invention;

FIG. 5 is a schematic diagram illustrating a specific block boundary coincidence according to the present invention;

FIG. 6 is a flowchart of a concurrent access method for a heterogeneous memory pool according to the present invention;

FIG. 7 is a schematic diagram of a two-layer index structure according to the present invention;

fig. 8 is a schematic structural diagram of a concurrent access device for heterogeneous memory pool provided by the present invention;

fig. 9 is a block diagram of an electronic device according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to better utilize heterogeneous memory in the data processing process, the embodiment provides a concurrent access method of a heterogeneous memory pool, which can efficiently utilize the heterogeneous memory and improve the concurrent access efficiency of the memory.

The embodiment of the invention discloses a concurrent access method of heterogeneous memory pools, which is shown in fig. 1, and can comprise the following steps:

step S11: and acquiring input data related to the data processing model, and splitting the input data to obtain corresponding data blocks.

In this embodiment, input data related to the data processing model is first obtained, and it can be understood that the data in the data processing model generally has two parts, namely input data and model parameters, the data size of the input data is generally larger, the access times are less, the model parameters are relatively smaller, and the access times are more. Because the data volume of the input data is large, usually more than hundreds G, the input data is placed in a sufficiently large external memory in the traditional scheme, and the input data is sequentially read when accessed, but the data is frequently read in the process of accessing, the reading efficiency is low, the performance bottleneck can be caused, and in order to accelerate the access efficiency, the persistent memory is fully utilized, and the data is stored in the persistent memory in a blocking mode.

Step S12: and storing the model parameters related to the data processing model into a dynamic random access memory, and storing the data block into a persistent memory.

First, a heterogeneous memory pool, which is a technique for managing heterogeneous memory systems, will be explained. Heterogeneous memory systems are composed of different types of memory, such as conventional DRAM (dynamic random access memory) and non-volatile memory (NVM), etc. The heterogeneous memory pool can realize unified management and utilization of the different memory resources. For example, persistent memory provides a memory solution that performs close to DRAM, but at lower cost, which is very advantageous for applications where memory consumption is significant. However, the introduction of multi-level storage architecture also presents a higher challenge for performance optimization. High performance caching is significant in performance tuning. On one hand, hot spots often exist in real data, and the cache can effectively improve the access performance of the hot spot data; on the other hand, cache sensitive data structures (caches) are often of elaborate design for squeezing hardware performance. Thus, the presence of persistent memory makes this storage hierarchy more complex, placing higher demands on the design of multi-level caching mechanisms, data structures, and algorithms.

The main objective of the heterogeneous memory pool is to provide a unified interface and management mechanism, so that the application program can effectively utilize the heterogeneous memory system and optimize the performance and energy consumption thereof. The following are some key features and functions of the heterogeneous memory pool: 1. and unified management interface. The heterogeneous memory pool provides a unified management interface through which the application program can manage and operate heterogeneous memory resources; this allows applications to more conveniently access different types of memory. 2. Memory classification and policy. The heterogeneous memory pool allows the memory to be divided into a plurality of levels and defines an access policy for each level; in this way, applications can place data on the appropriate memory hierarchy depending on access patterns and data access requirements to achieve higher performance and energy efficiency. 3. Data migration and replication. The heterogeneous memory pool can automatically perform data migration and copying between different memory levels according to the access mode and the requirements; this may increase access efficiency, enable hot data to be located in faster memory, and reduce data access latency. 4. Prefetch and cache management. The heterogeneous memory pool can predict the data access mode of an application program through a prefetching and caching management technology, and load data into an appropriate memory hierarchy in advance; this may reduce data access latency and improve application performance. 5. Programming interfaces and tool support. The heterogeneous memory pool generally provides corresponding programming interfaces and tools, so that developers can conveniently utilize heterogeneous memory resources; these interfaces and tools can help developers achieve efficient storage and access of data, optimizing the performance of the application. 6. The heterogeneous memory pool can be implemented in a variety of ways, including hardware support and software drivers; the specific implementation may vary depending on the vendor, architecture, and system design. It should be noted that heterogeneous memory pools may have different names and implementation details in different platforms and systems. 7. Heterogeneous memory pool technology is critical to improving the efficiency and performance of heterogeneous memory systems. The method can make full use of the advantages of different types of memories by application programs, and makes trade-off between performance and energy efficiency, so as to adapt to increasingly diversified application requirements.

In this embodiment, the heterogeneous memory pool may include a persistent memory (pMEM) and a dynamic random access memory, which have different characteristics. The persistent memory is large but the access efficiency is slow, the size of the monoblock memory can reach 512G, the DRAM memory is smaller but the access efficiency is fast, and the monoblock memory generally has 32G. In this embodiment, the data processing model parameters are frequently accessed and the amount of data is small, typically in the range of tens to hundreds of megabytes in size. Therefore, the whole model parameters are completely stored in the dynamic random access memory, and the access efficiency is improved.

In this embodiment, the reading the target data block from the persistent memory to the dynamic random access memory may include: and reading a target data block from the persistent memory to the dynamic random access memory according to the size of the persistent memory and the size of the dynamic random access memory, and the memory occupation size of the input data and the memory occupation size of the model parameter. When the target data block is read from the persistent memory to the dynamic random access memory, the size of the dynamic random access memory, the memory occupation size of the target data block and the memory occupation size of the model parameters need to be considered, and the memory is prevented from being exceeded, wherein the memory occupation size of the target data block is determined according to the input data size and the number of the data blocks.

In this embodiment, before the reading the target data block from the persistent memory to the dynamic random access memory according to the size of the persistent memory and the size of the dynamic random access memory, and the memory occupation size of the input data and the memory occupation size of the model parameter, the method may further include: determining the size of the heterogeneous memory in the heterogeneous memory pool; the heterogeneous memory comprises a persistent memory and a dynamic random access memory; and determining the memory occupation size of the input data and the memory occupation size of the model parameters. That is, before the data block is read, the sizes of the persistent memory and the dynamic random access memory, and the memory occupation size of the input data and the memory occupation size of the model parameters need to be determined.

In this embodiment, the determining the memory occupation size of the input data and the memory occupation size of the model parameter may include: and respectively calculating the memory occupation sizes of the input data and the model parameters according to the size parameters and the target data types respectively corresponding to the input data and the model parameters. Specifically, the memory occupation size of the input data is calculated according to the length, the width, the channel number and the target data type corresponding to the input data, and the memory occupation size of the model parameters is calculated according to the length, the width, the channel number and the target data type corresponding to the model parameters.

In this embodiment, the size parameters may include length, width, and number of channels; the target data type may be a floating point type. In this embodiment, the calculating the memory occupancy sizes of the input data and the model parameters respectively may include: calculating the product of the length, width, channel number and bit number of the target data type corresponding to the input data to obtain the memory occupation size of the input data; and calculating the product of the length, the width, the channel number and the bit number of the target data type corresponding to the model parameters to obtain the memory occupation size of the model parameters. For example, the length, width, and channel number of the input Data are (H, W, C), the length, width, and channel number of the model parameters are (k_h, k_w, k_c), and the Data type is calculated according to the floating point type, and then the memory occupation size data=w×h×c×32 of the input Data, and the memory occupation size parameter=k_h×k_w×k_c×32 of the model parameters.

Step S13: and constructing a two-layer index according to each input data and the data block corresponding to the input data in the persistent memory.

In order to accelerate the searching efficiency during data access in a data processing model, a set of indexes is usually built for searching the data, but the traditional mode is to only build a single index, but for different data distribution characteristics, different memory access efficiencies and different indexes, the searching performance difference is obvious. Therefore, in this embodiment, according to the data property and the storage structure, an adaptive two-layer index is constructed, that is, for the input data, the two-layer index is reconstructed for the data after the partitioning, so as to more efficiently find the data.

Step S14: and acquiring read requests of at least two data processing tasks, and accessing the persistent memory in parallel according to the read requests and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters. And through concurrent access and calculation for the heterogeneous memory pool, the storage access of the high-flux application data and the prefetching and migration processes of intermediate results are accelerated. And various types of memory resources are efficiently utilized to support the requirement of great computational power for deep learning in artificial intelligence data processing.

In this embodiment, the persistent memory supports parallel reading of multiple tasks, so that parallel processing of multiple tasks, that is, parallel access is performed between partitioned data blocks, and there is no interdependence between the data blocks, and parallel access computation is performed. The target data block can be one or a plurality of target data blocks, and the single data block can be directly combined with the model parameters for data processing through data block division.

In this embodiment, the accessing the persistent memory in parallel according to the read request and the two-layer index to read the target data block from the persistent memory to the dynamic random access memory may include: accessing the persistent memory in parallel according to the read request and the two-layer index so as to read a target data block from the persistent memory to temporarily store the target data block into a first cache queue; and reading the target data block from the first cache queue to the dynamic random access memory in sequence, and carrying out data parallel processing by combining the model parameters. It can be understood that when different tasks perform data access processing, in order to ensure independence among multiple tasks, a buffer queue is added between a persistent memory and a DRAM, and the queue follows a first-in first-out principle. Specifically, for example, as shown in fig. 2, when a data block is written into a queue, writing of data processed by an individual task is started, and when a data block is accessed, a plurality of computing tasks are started to simultaneously read data in the queue for computing.

In this embodiment, the reading the target data block from the first cache queue to the dynamic random access memory in sequence and performing data parallel processing in combination with the model parameter may include: temporarily storing the processing result output by each data processing task into a second buffer queue; and splicing the processing results output by the single data processing task in the second buffer queue according to the sequence to obtain a complete merging result corresponding to each data processing task. After the calculation is completed, the results are respectively stored into a result buffer queue for subsequent data combination, for example, as shown in fig. 3, the data blocks are respectively operated with model parameters to obtain corresponding calculation results, and then all calculation results are spliced in sequence to obtain the final processing result of the task.

In this embodiment, after the processing results output by the single data processing task in the second buffer queue are spliced in order to obtain a complete merging result corresponding to each data processing task, the method may further include: and reading the complete merging result from a second cache queue positioned in the dynamic random access memory to the persistent memory for storage. The buffer queue is in the dynamic random access memory, so that the complete merging result is read from the dynamic random access memory to the persistent memory for storage after the complete merging result is obtained, and the phenomenon of insufficient using space of the dynamic random access memory is avoided.

From the above, in this embodiment, input data related to a data processing model is obtained, and the input data is split to obtain corresponding data blocks; storing the model parameters related to the data processing model into a dynamic random access memory, and storing the data block into a persistent memory; constructing a two-layer index according to each input data and the data block corresponding to the input data in the persistent memory; and acquiring read requests of at least two data processing tasks, and accessing the persistent memory in parallel according to the read requests and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters.

Therefore, in this embodiment, by splitting the input data, storing the input data into the persistent memory according to the characteristics of the input data and the model parameters, and storing the model parameters into the dynamic random access memory, the heterogeneous memory can be fully utilized, and the two-layer index is built for the data blocks in the persistent memory, so that the target data blocks are read from the persistent memory to the dynamic random access memory in parallel for processing through the two-layer index during data processing, the heterogeneous memory can be efficiently utilized, the concurrent access efficiency of the memory is improved, and the performance of the whole data processing is further improved.

The embodiment of the invention discloses a specific heterogeneous memory pool concurrent access method, which is shown in fig. 4 and can comprise the following steps:

step S21: input data associated with the data processing model is acquired.

Step S22: splitting the input data according to the size parameters respectively corresponding to the input data and the model parameters and a preset splitting rule to obtain a data block with the size larger than the model parameters and proportional to the size of the model parameters.

In this embodiment, according to the size parameters corresponding to the input data and the model parameters, the input data is split according to a preset splitting rule to obtain a data block with a size larger than the model parameters and proportional to the size of the model parameters, that is, the size of the data block obtained by splitting the preset splitting rule is larger than the model parameters, and the size of the data block is proportional to the size of the model parameters, so that the condition is satisfied.

In this embodiment, splitting the input data according to the size parameters corresponding to the input data and the model parameters respectively and a preset splitting rule may include: calculating a first ratio of the width of the input data to the width of the model parameter, and taking the product of the first ratio and the first ratio parameter as the width of the data block; calculating a second ratio of the length of the input data to the length of the model parameter, and taking the product of the second ratio and the second ratio parameter as the length of the data block; splitting the input data according to the length and the width corresponding to the data block. In other words, the embodiment provides a specific input data splitting scheme, taking the length, width and channel number of the input data as (H, W, C) and the length, width and channel number of the model parameters as (k_h, k_w, k_c) as examples, dividing the length and width according to the priority order of data access during data division, and the dividing ratio is according to the ratio of the length and width of the input data to the length and width of the model parameters. Divided width=w/k_w×δ, δ being a first scale parameter. Long=h/k_h×β, β being the second scale parameter. In this case, the data including all channels is required to be calculated by convolution because the data is divided by length and width, but not by channel number, and thus the calculation cannot be performed by combining the model parameters with a single data block after the channel division.

In this embodiment, splitting the input data according to the length and the width corresponding to the data block may include: and acquiring a repetition width, and splitting the input data according to the length and the width corresponding to the data blocks and the repetition width, so that two adjacent split data blocks contain partial repeated data. It can be understood that in the convolution operation process, the convolution operation is performed according to the window moving in sequence according to the step length, so that the data selected by the window frame each time and the previous data are repeated during operation, in order to adapt the convolution operation, the operation of a single data block and the model parameters is ensured, and the input data are split according to the length, the width and the repetition width corresponding to the data block during data splitting. For example, as shown in fig. 5, data between virtual lines is superimposed in 6 data blocks.

In this embodiment, the acquiring the repetition width may include: and determining the repetition width according to the characteristic of the convolution operation. I.e. the repetition width is mainly to adapt the convolution operation, so that the repetition width can be determined from the characteristics of the convolution operation, e.g. the step size of the convolution operation is selected as the repetition width.

In this embodiment, the determining the repetition width according to the feature of the convolution operation may include: obtaining a step length corresponding to convolution operation, and calculating a difference value between the width of the model parameter and the step length; comparing the difference value with the value of the step length, and taking the larger value as the repetition width. That is, in a specific manner, the width of the data repetition is calculated according to the length and width of the model parameter and the step size, and the repetition width w_c=max { k_w-stride, stride }, k_w is the width of the model parameter, stride is the step size, and max { } represents taking the maximum value.

Step S23: and storing the model parameters related to the data processing model into a dynamic random access memory, and storing the data block into a persistent memory.

Step S24: and constructing a two-layer index according to each input data and the data block corresponding to the input data in the persistent memory.

Step S25: and acquiring read requests of at least two data processing tasks, and accessing the persistent memory in parallel according to the read requests and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters.

The specific process of the steps S21 and S23-S25 may refer to the corresponding content disclosed in the foregoing embodiment, and will not be described herein.

As can be seen from the above, in this embodiment, according to the size parameters corresponding to the input data and the model parameters, the input data is split according to a preset splitting rule, so as to obtain a data block with a size greater than the model parameters and proportional to the size of the model parameters. And in the data processing process, the target data block is read from the persistent memory to the dynamic random access memory in parallel through the two-layer index for processing, so that the heterogeneous memory can be efficiently utilized, the concurrent memory access efficiency of the memory is improved, and the performance of the whole data processing is further improved.

The embodiment of the invention discloses a specific heterogeneous memory pool concurrent access method, which is shown in fig. 6 and can comprise the following steps:

step S31: and acquiring input data related to the data processing model, and splitting the input data to obtain corresponding data blocks.

Step S32: and storing the model parameters related to the data processing model into a dynamic random access memory, and storing the data block into a persistent memory.

Step S33: and constructing a first-layer index according to each input data in the persistent memory.

In this embodiment, the constructing a first layer index according to each of the input data in the persistent memory may include: and constructing an unordered index according to each input data in the persistent memory, and taking the unordered index as the first-layer index.

In this implementation, a first-layer index is first constructed according to each input data in the persistent memory, the input data are independent, the inputs are independent, and there is no sequential distinction between the inputs, so that a faster-querying unordered index can be constructed in the first layer, and the first-layer unordered index is used for retrieving the input data. For example, as shown in fig. 7, the unordered index is constructed from input data (input 1, input 2 … input n).

In this embodiment, the constructing the unordered index according to each input data in the persistent memory may include: and constructing a disorder index based on a hash index mode and according to each input data in the persistent memory. Preferably, the unordered index of the first layer may be constructed in a hash index manner, and of course, other unordered index manners may also be used.

Step S34: and constructing a second layer index according to the data block corresponding to the input data on the basis of the first layer index.

In this embodiment, on the basis of the first layer index, a second layer index is constructed according to the data block corresponding to the input data, that is, the second layer index is used to retrieve a specific data block, that is, when the data block is read, the input data is determined by the first layer index, and then a specific target data block is determined by the second layer index.

In this embodiment, the constructing a second layer index according to the data block corresponding to the input data based on the first layer index may include: on the basis of the first layer index, an ordered index is constructed according to the data blocks corresponding to each piece of input data and the sequence of the data blocks; the ordered index is used as the second layer index to obtain a mixed index for the input data. It will be appreciated that the data after partitioning needs to be stored in the persistent memory in a block manner, and although the blocks are independent of each other, the blocks are dependent on each other in sequence, so that the second layer is more suitable for constructing an ordered index which is more suitable for ordered searching, and more efficient data searching can be performed after the hybrid index construction is completed.

In this embodiment, the constructing an ordered index according to the data block corresponding to each input data and the order of the data blocks may include: and constructing unordered indexes based on a B+ tree index mode and according to the data blocks corresponding to the input data and the sequence of the data blocks. Preferably, the ordered index of the second layer can be constructed by adopting a B+ tree index mode, and other ordered index modes can be adopted.

In this embodiment, starting from two aspects of memory characteristics of a storage medium and a data processing model, a hybrid index construction for concurrent memory of a multi-level memory pool is provided, a multi-level directory structure is constructed based on data block grouping of the data processing model for storage, and the hybrid index construction under different memory characteristics is realized through a double-layer hash index and a b+ tree index structure, so that the efficiency of data searching is improved.

Step S35: and acquiring read requests of at least two data processing tasks, and accessing the persistent memory in parallel according to the read requests and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters.

The specific processes of the steps S31, S32, and S35 may refer to the corresponding contents disclosed in the foregoing embodiments, and will not be described herein.

As can be seen from the above, in this embodiment, a first layer index is constructed according to each input data in the persistent memory; and constructing a second layer index according to the data block corresponding to the input data on the basis of the first layer index. Through data division and mixed index construction, persistent memory in heterogeneous memory is utilized efficiently, access and calculation of data blocks can be performed in parallel, concurrent access to a heterogeneous memory pool can efficiently utilize memories with different characteristics according to different characteristics of data, access efficiency of the memories is improved to the maximum extent, data processing performance is improved, resources of all bottom hardware can be fully utilized by the implementation mode, and huge calculation power requirements in an artificial intelligent data processing model are supported by deploying various memory resources on a server platform.

Correspondingly, the embodiment of the invention also discloses a heterogeneous memory pool concurrent access device, which is shown in fig. 8 and comprises the following steps:

the data block splitting module 11 is configured to obtain input data related to a data processing model, and split the input data to obtain a corresponding data block;

A storage module 12, configured to store the model parameters related to the data processing model into a dynamic random access memory, and store the data block into a persistent memory;

the index construction module 13 is configured to construct a two-layer index according to each input data and a data block corresponding to the input data in the persistent memory;

the memory access module 14 is configured to obtain a read request of at least two data processing tasks, and access the persistent memory in parallel according to the read request and the two-layer index, so as to read a target data block from the persistent memory to the dynamic random access memory, so as to perform data processing according to the target data block and the model parameter.

In some specific embodiments, the data block splitting module 11 may specifically include:

and the data block splitting unit is used for splitting the input data according to the size parameters corresponding to the input data and the model parameters respectively and a preset splitting rule so as to obtain a data block with the size larger than the model parameters and in proportion to the size of the model parameters.

In some specific embodiments, the data block splitting unit may specifically include:

a width determining unit for calculating a first ratio of the width of the input data to the width of the model parameter, taking a product of the first ratio and the first ratio parameter as the width of the data block;

A length determining unit, configured to calculate a second ratio of the length of the input data to the length of the model parameter, and take a product of the second ratio and a second ratio parameter as the length of the data block;

and the splitting unit is used for splitting the input data according to the length and the width corresponding to the data block.

In some embodiments, the splitting unit may specifically include:

and the repetition width acquisition unit is used for acquiring the repetition width, splitting the input data according to the length and the width corresponding to the data blocks and the repetition width, so that two adjacent split data blocks contain partial repetition data.

In some specific embodiments, the repetition width obtaining unit may specifically include:

and the repetition width determining unit is used for determining the repetition width according to the characteristic of the convolution operation.

In some specific embodiments, the repetition width determining unit may specifically include:

the difference value calculation unit is used for obtaining the step length corresponding to the convolution operation and calculating the difference value between the width of the model parameter and the step length;

and the repetition width selection unit is used for comparing the difference value with the numerical value of the step length and taking the numerical value with larger numerical value as the repetition width.

In some embodiments, the memory access module 14 may specifically include:

and the memory parallel access unit is used for reading a target data block from the persistent memory to the dynamic random access memory according to the size of the persistent memory and the size of the dynamic random access memory, as well as the memory occupation size of the input data and the memory occupation size of the model parameter.

In some embodiments, the memory access module 14 may specifically include:

the heterogeneous memory size determining unit is used for determining the size of the heterogeneous memory in the heterogeneous memory pool; the heterogeneous memory comprises a persistent memory and a dynamic random access memory;

and the data memory occupation determining unit is used for determining the memory occupation size of the input data and the memory occupation size of the model parameters.

In some embodiments, the data memory occupancy determining unit may specifically include:

and the memory occupation size calculation unit is used for calculating the memory occupation sizes of the input data and the model parameters according to the size parameters and the target data types respectively corresponding to the input data and the model parameters.

In some embodiments, the dimensional parameters may include length, width, and channel number; the target data type is floating point type.

In some embodiments, the memory footprint size calculation unit may specifically include:

the memory occupation size calculation unit of the input data is used for calculating the product of the length, the width, the channel number and the bit number of the target data type corresponding to the input data to obtain the memory occupation size of the input data;

and the model parameter memory occupation size calculation unit is used for calculating the product of the length, the width, the channel number and the bit number of the target data type corresponding to the model parameter to obtain the memory occupation size of the model parameter.

In some embodiments, the memory access module 14 may specifically include:

the data block reading unit is used for accessing the persistent memory in parallel according to the reading request and the two-layer index so as to read a target data block from the persistent memory to be temporarily stored in a first cache queue;

and the data processing unit is used for reading the target data blocks from the first cache queue to the dynamic random access memory in sequence and carrying out data parallel processing by combining the model parameters.

In some embodiments, the memory access module 14 may specifically include:

the result buffer unit is used for temporarily storing the processing result output by each data processing task to the second buffer queue;

and the splicing unit is used for splicing the processing results output by the single data processing task in the second buffer queue in sequence to obtain a complete merging result corresponding to each data processing task.

In some embodiments, the splicing unit may specifically include:

and the storage unit is used for reading the complete merging result from the second cache queue positioned in the dynamic random access memory to the persistent memory for storage.

In some specific embodiments, the index building module 13 may specifically include:

the first layer index construction unit is used for constructing a first layer index according to each input data in the persistent memory;

and the second layer index construction unit is used for constructing a second layer index according to the data block corresponding to the input data on the basis of the first layer index.

In some specific embodiments, the first layer index building unit may specifically include:

and the unordered index construction unit is used for constructing an unordered index according to each input data in the persistent memory and taking the unordered index as the first-layer index.

In some embodiments, the second-layer index building unit may be specifically configured to build an unordered index based on a hash index manner and according to each of the input data in the persistent memory.

In some specific embodiments, the second-layer index building unit may specifically include:

the ordered index construction unit is used for constructing an ordered index according to the data block corresponding to each input data and the sequence of the data blocks on the basis of the first-layer index;

and the mixed index determining unit is used for taking the ordered index as the second layer index to obtain a mixed index aiming at the input data.

In some embodiments, the ordered index building unit may be specifically configured to build an unordered index based on a b+ tree index manner and according to a data block corresponding to the input data and an order of the data blocks.

Further, the embodiment of the invention also discloses an electronic device, and referring to fig. 9, the content in the drawing should not be considered as any limitation on the application scope of the invention.

Fig. 9 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present invention. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program, where the computer program is loaded and executed by the processor 21 to implement relevant steps in the heterogeneous memory pool concurrent access method disclosed in any of the foregoing embodiments.

In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present invention, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.

The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon include an operating system 221, a computer program 222, and data 223 including input data, and the storage may be temporary storage or permanent storage.

The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the processor 21 on the mass data 223 in the memory 22, which may be WindowsServer, netware, unix, linux. The computer program 222 may further comprise a computer program capable of performing other specific tasks in addition to the computer program capable of performing the heterogeneous memory pool concurrent access method performed by the electronic device 20 disclosed in any of the foregoing embodiments.

Further, the embodiment of the invention also discloses a computer storage medium, wherein the computer storage medium stores computer executable instructions, and when the computer executable instructions are loaded and executed by a processor, the steps of the heterogeneous memory pool concurrent access method disclosed in any embodiment are realized.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The method, the device, the equipment and the medium for concurrent access of heterogeneous memory pools provided by the invention are described in detail, and specific examples are applied to illustrate the principles and the implementation modes of the invention, and the description of the above examples is only used for helping to understand the method and the core ideas of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. The concurrent access method for the heterogeneous memory pool is characterized by comprising the following steps of:

acquiring read requests of at least two data processing tasks, and accessing the persistent memory in parallel according to the read requests and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters;

the splitting the input data to obtain corresponding data blocks includes:

splitting the input data according to the size parameters respectively corresponding to the input data and the model parameters and a preset splitting rule to obtain a data block with the size larger than the model parameters and proportional to the size of the model parameters;

The splitting the input data according to the size parameters corresponding to the input data and the model parameters respectively and a preset splitting rule comprises the following steps:

2. The heterogeneous memory pool concurrent access method according to claim 1, wherein splitting the input data according to the length and the width corresponding to the data block comprises:

and acquiring a repetition width, and splitting the input data according to the length and the width corresponding to the data blocks and the repetition width, so that two adjacent split data blocks contain partial repeated data.

3. The heterogeneous memory pool concurrent access method according to claim 2, wherein the obtaining the repetition width comprises:

4. The heterogeneous memory pool concurrent access method according to claim 3, wherein the determining the repetition width according to the characteristic of the convolution operation comprises:

5. The heterogeneous memory pool concurrent access method according to claim 1, wherein the reading the target data block from the persistent memory to the dynamic random access memory comprises:

6. The heterogeneous memory pool concurrent access method according to claim 5, wherein before the target data block is read from the persistent memory to the dynamic random access memory according to the size of the persistent memory and the size of the dynamic random access memory, and the memory occupancy size of the input data and the memory occupancy size of the model parameter, further comprising:

7. The heterogeneous memory pool concurrent access method according to claim 6, wherein determining the memory footprint size of the input data and the memory footprint size of the model parameter comprises:

8. The heterogeneous memory pool concurrent access method of claim 7, wherein the size parameters include length, width and channel number; the target data type is floating point type.

9. The heterogeneous memory pool concurrent access method according to claim 8, wherein the calculating the memory occupancy sizes of the input data and the model parameters respectively includes:

10. The heterogeneous memory pool concurrent access method according to claim 1, wherein the accessing the persistent memory in parallel according to the read request and the two-layer index to read a target data block from the persistent memory to the dynamic random access memory comprises:

11. The heterogeneous memory pool concurrent access method according to claim 10, wherein the sequentially reading the target data blocks from the first cache queue to the dynamic random access memory and performing data parallel processing in combination with the model parameters includes:

12. The heterogeneous memory pool concurrent access method according to claim 11, wherein the splicing the processing results output by the single data processing task in the second buffer queue in order, after obtaining the complete merging result corresponding to each data processing task, further comprises:

13. The method for concurrent access to heterogeneous memory pools according to any one of claims 1 to 12, wherein the constructing a two-layer index according to each input data and a data block corresponding to the input data in the persistent memory includes:

14. The heterogeneous memory pool concurrent access method according to claim 13, wherein the constructing a first layer index according to each of the input data in the persistent memory comprises:

15. The heterogeneous memory pool concurrent access method according to claim 14, wherein the constructing an unordered index from each of the input data in the persistent memory comprises:

16. The heterogeneous memory pool concurrent access method according to claim 14, wherein constructing a second layer index according to the data block corresponding to the input data based on the first layer index comprises:

17. The heterogeneous memory pool concurrent access method according to claim 16, wherein the constructing an ordered index according to the data block corresponding to each input data and the order of the data blocks comprises:

18. A heterogeneous memory pool concurrent access device, comprising:

the memory access module is used for acquiring read requests of at least two data processing tasks, accessing the persistent memory in parallel according to the read requests and the two-layer index, and reading a target data block from the persistent memory to the dynamic random access memory so as to execute data processing according to the target data block and the model parameters;

the data block splitting module is further configured to split the input data according to a preset splitting rule according to size parameters corresponding to the input data and the model parameters, so as to obtain a data block with a size larger than the model parameters and proportional to the size of the model parameters;

The data block splitting module is further configured to calculate a first ratio of the width of the input data to the width of the model parameter, and take a product of the first ratio and the first ratio parameter as the width of the data block; calculating a second ratio of the length of the input data to the length of the model parameter, and taking the product of the second ratio and the second ratio parameter as the length of the data block; splitting the input data according to the length and the width corresponding to the data block.

19. An electronic device, comprising:

a memory for storing a computer program;

a processor configured to execute the computer program to implement the heterogeneous memory pool concurrent access method of any one of claims 1 to 17.

20. A computer-readable storage medium storing a computer program; wherein the computer program when executed by a processor implements the heterogeneous memory pool concurrent access method of any of claims 1 to 17.