WO2011127649A1

WO2011127649A1 - Method and device for processing common data structure

Info

Publication number: WO2011127649A1
Application number: PCT/CN2010/071736
Authority: WO
Inventors: 胡睿; 钱俊
Original assignee: 华为技术有限公司
Priority date: 2010-04-13
Filing date: 2010-04-13
Publication date: 2011-10-20
Also published as: CN102362256B; CN102362256A

Abstract

The present invention relates to a method and device for processing a common data structure. The method includes: each sub-data-structure of a common data structure is distributed through a common part of said common data structure to the corresponding core of a multi-core processor, wherein said common part contains the data range of said each sub-data-structure; and corresponding cores of said multi-core processor individually process the distributed sub-data-structures. Through the distribution of each sub-data-structure of the common data structure to the corresponding core of the multi-core processor for individual processing, an inter-core load imbalance is avoided, and the cores of the multi-core processor are enabled to concurrently process the common data structure, thus taking full advantage of the multi-core processor and improving processing efficiency of the multi-core processor.

Description

Method and device for processing public data structure

The present invention relates to multi-core processor technology, and more particularly to a method and apparatus for processing a common data structure. Background technique

The traditional CPU has a kernel inside, that is, a single-core CPU. The processing of common data structures on a single-core CPU, although it is possible to execute multi-core processes or threads concurrently, only one thread of instructions is executing at a time, and the protection of common data structures such as global variables, peripherals, etc. is relatively simple. To further increase the processing speed of public data structures, multi-core CPUs have emerged.

Multi-core CPUs are the development of semiconductor technology, CPU integration and frequency are increasing, and a CPU with multiple cores has emerged. Multicore CPUs have evolved from the first two cores to dozens of cores today, and some multicore processors under development have even thousands of cores. The first method still processes complex public data structures on the same core. Specifically, when dividing a thread, any complex public data structure is placed on the same core for processing. This method can not guarantee the uniform load on each core when the processing of a certain data structure consumes more CPU time, which leads to some nuclear processing tasks being heavy, while other cores [艮 idle, multi-core parallel computing advantages Can't fully play.

The second method is to treat complex common data structures with multiple cores, but the parts of the different core processed data structures are completely unrelated. For example, to perform a function requires first processing a trigeminal tree with one core, and then processing another binary tree with another core, and there is no direct association between the trifurcation tree and the binary tree. Although this method divides the load common data structure on multiple cores, the division of computing resources between cores is almost entirely limited by the composition of the payload-public data structure. It still causes uneven load between cores and cannot fully exploit the advantages of multi-core processors. The third method is to spread the operation of the strongly coupled common data structure on different cores, but to perform mutual exclusion protection (such as spin lock) when accessing the data structure. In this method, when multiple cores access the common data structure at the same time, more nuclear waits may occur, resulting in waste of CPU resources. Summary of the invention

Embodiments of the present invention provide a method and apparatus for processing a common data structure to implement parallel processing of a common data structure on a multi-core processor.

Embodiments of the present invention provide a method for processing a common data structure, including:

Distributing each sub-data structure in the common data structure to a corresponding core on the multi-core processor through a common portion in the common data structure; the common portion includes a data range of the respective sub-data structures;

The sub-data structures of the corresponding collated distributions on the multi-core processor are processed separately.

The embodiment of the invention further provides an apparatus for processing a common data structure, including:

a distribution module, configured to distribute each sub-data structure in the common data structure to a corresponding core on a multi-core processor by a common part in a common data structure; the common part includes a data range of each of the sub-data structures ;

A processing module, located on a corresponding core of the multi-core processor, for separately processing the distributed sub-data structure.

The technical solution provided by the foregoing embodiment distributes each sub-data structure in the common data structure to the corresponding core on the multi-core processor, performs separate processing, avoids unbalanced load between the cores, and implements verification of common data on the multi-core processor. The parallel processing of the structure fully utilizes the advantages of the multi-core processor, improves the processing efficiency of the multi-core processor, thereby avoiding more nuclear waiting phenomena and avoiding waste of CPU resources. DRAWINGS

In order to more clearly illustrate the technical solution in the embodiment of the present invention, the following will be required in the embodiment. The drawings to be used are briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and can be used by those skilled in the art without any inventive labor. These figures take additional drawings. Figure

a schematic

Figure 3 is too 曰.

Hierarchical diagram

Figure 4 is the hair

Schematic diagram of the division of space

Figure 5 is too ^ 曰. The full binary tree in the law is shown in Figure 6 W • The method of obtaining the nuclear number in the law;

Figure 7 is the hair

Schematic diagram of structural boundary adjustment

Figure 8 ^ A schematic diagram of a distribution in the W method;

Figure 9 is a schematic representation of a distribution of too W;

Figure 10 is a schematic flow chart of the use of the rotary lock of the present invention;

FIG. 11 is a schematic structural diagram of an apparatus for processing a common data structure according to an embodiment of the present invention; FIG. 12 is a schematic structural diagram of an underlying processing module 113 in an apparatus for processing a common data structure according to an embodiment of the present invention. detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

FIG. 1 is a flowchart of a method for processing a common data structure according to an embodiment of the present invention. As shown in Figure 1, the method includes:

Step 11. Distribute each sub-data structure in the common data structure to a corresponding core on the multi-core processor by using a common part in the common data structure; the common part includes a data range of the respective sub-data structures;

Step 12: The corresponding sub-data structure of the corresponding verification and distribution on the multi-core processor is separately processed, for example, searching for a sub-data structure.

The technical solution provided by the foregoing embodiment distributes each sub-data structure in the common data structure to the corresponding core on the multi-core processor, performs separate processing, avoids unbalanced load between the cores, and implements verification of common data on the multi-core processor. The parallel processing of the structure gives full play to the advantages of multi-core processors and improves the processing efficiency of multi-core processors.

In the above step 11, the distribution may be performed by one core, may be performed by multiple cores, or may be performed by a partial processing module of one or more cores to distribute each sub-data structure to different cores, that is, by different Check different parts of the common data structure for processing.

A common data structure is a complete data structure in which multiple cores are processed in parallel at the same time period. The public data structure can be as shown in Figure 2, which is the same binary tree that multiple cores must handle at work.

In order to efficiently divide the processing of common data structures onto multiple cores, the public data structure is a general abstraction. As shown in Figure 3, the public data structure is divided into three levels: public part, sub data structure, and memory management part from top to bottom.

There are multiple sub-data structures, and the common part can be divided into several units, each unit corresponding to a unique sub-data structure. The subdata structure is accessible through the information in each cell. Can also The computing resources that need to be consumed according to the sub-data structure corresponding to each unit are obtained according to the common part, such as a core that requires several core processing or what processing speed is required, and a computing resource, that is, an allocation core, is allocated for each sub-data structure.

There are two methods for allocating nuclear resources, one is static allocation method and the other is dynamic allocation method. The static allocation method, for example, according to the number of sub-data structures and the average allocation of nuclear resources; or weighted allocation according to the distribution characteristics of the nuclear resources consumed by each sub-data structure.

The dynamic allocation method can dynamically allocate nuclear resources according to the actual situation of each core load.

Assume that in a multi-core processor, the number of cores is 2 ^k (this simplifies the design of common data structures and the calculation of index data. If the number of cores is not the power of 2, it can be filled to the power of 2) , k is a positive integer. As shown, designated 2 ^k -l spatial boundary 4 numbered sub-data structure (not including the two endpoint number space), the 2 ^k -l boundary denoted by ^ (li <2 ^k -l) , bi is a variable whose value is the number of the corresponding sub-data structure. The number of sub-data structures processed by the i-th core (0 i < 2 ^k -l ) falls within the sub-tree structure within the interval [b _l b ₁₊₁ ). As shown in FIG. 5, the node of the full binary tree (the full binary tree itself can be organized without pointers, but directly with the position index) stores the boundary of the core data structure processed by the core. Compare the sub-data structure number corresponding to the sub-data structure to be processed with the full binary tree, record the comparison path, and obtain the core number. As shown in Figure 6, the sub-data structure number corresponding to the newly added entry is 70, and the comparison path is 101 (as indicated by the arrow), that is, the number of the obtained core is represented by a binary of 101 and a decimal representation of 5, in the nucleus. The sub-data structure numbered 70 is executed on 5. The kernel 5 is assigned to the sub-data structure numbered 70.

2 ^k cores each correspond to a queue of messages to be processed. When the data in the queue corresponding to a core exceeds the water line T, as shown in FIG. 7, the number of sub-data structures that can be processed by the core is reduced. 2, that is, the upper and lower boundaries are inwardly contracted by 1, ^ b ^ +l , b ₁₊₁ = b ₁₊₁ -l.

The method for allocating the nuclear resources is not limited thereto, and other methods for balancing the nuclear load may be used to allocate the nuclear resources, and details are not described herein again. The assigned core, that is, sends a message to the assigned core to inform it which sub-data structure to process. Then proceed to step 12. The memory management part includes memory resource parameters of the common data structure, which can provide a basis for management of memory resources such as allocation.

Step 13. Allocate memory resources for the core in step 12 according to the memory management part in the common data structure. Specifically, it may include:

In the case of processing a core application memory resource of the sub-data structure, allocating a memory resource for a corresponding core on the multi-core processor according to a memory management portion in the common data structure;

For example, when there is no memory resource available for processing a core of a sub-data structure, memory partitioning can be requested according to the memory management section. Memory is allocated to each core according to the memory management part, so that each core has space to process the sub-data structure, and manages the memory of each core. When you allocate memory, you can reserve a portion of the memory based on how much memory is occupied. If the memory is occupied a lot, a part of the memory is reserved. If the memory is occupied less, a part of the memory is reserved; if there are more memory resources, the memory is reserved as much as possible. That is, space is exchanged for time, and more memory space is reserved, so that the time spent on memory management is reduced, so that for all memory sharing, the processing modules on each core need to use lock protection when doing memory translation. In this case, the time spent by memory management is reduced, and the performance of CPU data processing is prevented from deteriorating.

Before the separate processing of the sub-data structure of the multi-core processor is performed, the memory resource is requested according to the memory management part. According to the memory management part, the memory is divided into blocks, and the memory is allocated for the core of the application memory resource, and the memory block available for itself is obtained. The memory partitions applied by the core can be divided into pages, that is, memory paging, which ensures that the memory between the cores is independent of each other. For any core, when building a self-managed data structure, request memory resources from the memory pages in its own memory partition. When you apply for memory paging inside the core, you do not need to do mutual protection, that is, lock protection.

When allocating memory, you can also apply for a new memory partition, that is, dynamically allocate memory, only when there is no space for all the memory partitions that the core has applied for. And, nuclear application and translation In memory, the amount of memory used by the memory block is recorded, and all memory blocks of the core are sorted by the amount of used memory. When applying for memory, it is preferred to apply for memory from the memory block with the highest memory usage; when relocating memory, try to move the data in the memory block with the lowest memory usage to the highest memory usage and not full. Memory partitioning. When it is found that a certain memory block of the core is not used, it is translated into the free memory block resource pool, and then managed according to the memory management part, in preparation for other core applications, ensuring sufficient memory partitioning. Utilize, improve the utilization of memory resources, and lock the memory resources of the core application processing the sub-data structure.

At this point, the processing of the entire public data structure is shown in Figure 8, which can be divided into three parts: distribution, separate processing, and underlying public processing. The underlying public processing allocates memory resources for the core that processes the sub-data structure. When the distribution in step 11 is completed hierarchically, as shown in Fig. 9, the distribution processing includes two levels of distribution 1, distribution 2 . Distribution hierarchy is the redistribution of multiple sub-data structures that cannot be handled by a single core until each sub-data structure is processed by a single core. Taking FIG. 9 as an example, the distribution 1 divides the sub-data structure into two parts, one part has only one sub-data structure, and after the distribution 1 is executed, the individual processing can be realized; the other part includes multiple sub-data structures, and the distribution 1 cannot be implemented. Separate processing, distribution must be performed again 2, to achieve separate processing.

The above lock protection can be a spin lock. The flow used by the spin lock is shown in Figure 10. Memory resources are locked. When multiple cores access the common data structure, each core must apply for a spin lock before accessing the public data structure. If the application is successful, the shared resources are operated and the memory resources are unlocked. If the spin lock is occupied by another core when applying the spin lock, the thread on the core applying for the spin lock will not hang, but will continue to judge whether the lock has been translated. When the spin lock is spinning, the CPU does not do useful work, which is actually wasting CPU time, so the spin lock is suitable for the scene with short access time to the public data structure. More specifically, the spin lock is suitable for sharing. The access time of the resource is less than the task switching time or the scenario equivalent to the task switching time. Therefore, in this embodiment, only the memory resources of the core application are locked and protected in step 13.

FIG. 11 is a schematic structural diagram of an apparatus for processing a common data structure according to an embodiment of the present invention. Such as As shown in FIG. 11, the device includes: a distribution module 111 and a processing module 112. The distribution module 111 is configured to distribute each sub-data structure in the common data structure to a corresponding core on the multi-core processor through a common part in the common data structure; the common part includes a data range of the respective sub-data structure . The processing module 112 is located on a corresponding core on the multi-core processor for separately processing the distributed sub-data structures.

The apparatus for processing a common data structure provided by the embodiment of the present invention may further include: an underlying processing module

113. The method is configured to allocate, according to a memory management part in the common data structure, a memory resource for a corresponding core on the multi-core processor, for processing a sub-data structure. The underlying processing module 113 may be specifically configured to allocate a memory resource to a corresponding core on the multi-core processor according to a memory management portion in the common data structure in a case where a memory space is reserved.

The apparatus for processing a common data structure provided by the embodiment of the present invention may further include: a static allocation module

114. The public data structure is used according to a number of sub-data structures in the common data structure and a distribution feature on the multi-core processor for consuming resources according to each sub-data structure in the common data structure. Each sub-data structure in the weight is assigned a corresponding core. At this time, the distribution module 111 may be specifically configured to distribute each sub-data structure in the common data structure to a corresponding core allocated by the static allocation module by a common part in a common data structure.

The apparatus for processing a common data structure provided by the embodiment of the present invention may further include: a dynamic allocation module

115. The method is used to compare a number of a sub-data structure to be processed with a full binary tree to obtain a comparison path; a number of a sub-data structure at a specified boundary, where the number of the boundary is larger than a number of cores on the multi-core processor The number is less than 1. At this time, the distribution module 111 may be specifically configured to distribute each sub-data structure in the common data structure to a core numbered as a value of the comparison path by a common part in a common data structure. The underlying processing module 113 may include: a resource allocation sub-module 116 and a lock protection sub-module 117, wherein the resource allocation sub-module 116 is configured to process the memory resource of the sub-data structure according to the public data structure The memory management part is the phase of the multi-core processor The core should allocate memory resources. The lock protection sub-module 117 is configured to lock and protect the memory resources of the core application processing the sub-data structure.

Or, as shown in FIG. 12, the bottom layer processing module 113 may include: a recording submodule 121 and a management submodule 122, wherein the recording submodule 121 is located on each core, and is used for recording used memory in each core memory partition. Quantity, and sort all memory blocks in each core by the amount of used memory. The management sub-module 122 is located on each core, and is used to apply for memory from the memory partition according to the order of the memory usage. In the case of releasing the memory, the data in the memory block with the lowest usage rate is moved to the memory usage. The highest and no memory block.

In the above embodiment, the device for processing the common data structure distributes each sub-data structure in the common data structure to the corresponding core on the multi-core processor through the distribution module, performs separate processing, avoids unbalanced load between the cores, and implements multi-core processing. The parallel processing of the common data structure on the device fully exploits the advantages of the multi-core processor and improves the processing efficiency of the multi-core processor.

A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing storage medium includes: ROM, RAM, magnetic disk or optical disk, and the like, which can store various program codes. Finally, the above embodiments are only used to illustrate the technical solution of the present invention. The invention is described in detail with reference to the foregoing embodiments, and those of ordinary skill in the art should understand that the technical solutions described in the foregoing embodiments may be modified or some of the techniques may be The features are equivalent to the equivalents; and the modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

Rights request

A method of processing a common data structure, comprising:

2. The method of processing a common data structure according to claim 1, wherein said allocating a memory resource to a corresponding core on said multi-core processor according to a memory management portion in said common data structure.

The method for processing a common data structure according to claim 1 or 2, further comprising:

Or weighting the corresponding cores for each sub-data structure in the common data structure according to the distribution characteristics of the consumption resources of each sub-data structure in the common data structure;

Distributing each sub-data structure in the common data structure to a corresponding core on the multi-core processor through a common portion in the common data structure includes:

Each sub-data structure in the common data structure is distributed to the assigned corresponding core by a common portion in the common data structure.

4. The method of processing a common data structure according to claim 1, further comprising:

Comparing the number of the sub-data structure to be processed with the full binary tree to obtain a comparison path; the number of the sub-data structure at the specified boundary, the number of the boundary being larger than the core on the multi-core processor The number is less than 1;

The sub-data structures corresponding to the sub-data structure numbers are distributed to the core numbered as the value of the comparison path by a common part in the common data structure.

The method for processing a common data structure according to claim 2, wherein the allocating memory resources for the corresponding cores on the multi-core processor according to the memory management portion of the common data structure comprises:

Locks the memory resources of the core application that handles the sub-data structure.

In the case of reserving the memory space, memory resources are allocated to the corresponding cores on the multi-core processor according to the memory management portion in the common data structure.

Each core records the amount of used memory in its respective memory partition, and sorts all of its memory partitions by the amount of used memory;

Each core applies for memory from the memory partition according to the order of memory usage. In the case of deciphering memory, the data in the memory partition with the lowest usage rate is moved to the memory partition with the highest memory usage and no full memory. .

8. An apparatus for processing a common data structure, comprising:

a distribution module, configured to pass the common data structure through a common part in a common data structure Each of the sub-data structures is distributed to a corresponding core on the multi-core processor; the common portion includes a data range of the respective sub-data structures;

9. The apparatus for processing a common data structure according to claim 8, further comprising:

And an underlying processing module, configured to allocate, according to a memory management part in the common data structure, a memory resource for a corresponding core on the multi-core processor, to process the sub-data structure.

The device for processing a common data structure according to claim 8, further comprising:

a static allocation module, configured to allocate, according to the number of sub-data structures in the common data structure and the core resources on the multi-core processor, a cloth feature for each sub-data structure in the common data structure, Each sub-data structure in the common data structure is weighted to allocate a corresponding core;

The distribution module is specifically configured to distribute each sub-data structure in the common data structure to a corresponding core allocated by the static allocation module by a common part in a common data structure.

a dynamic allocation module, configured to compare a number of the to-be-processed sub-data structure with a full binary tree to obtain a comparison path; the node of the full binary tree stores a specified boundary in a number space of the sub-data structure in the common data structure Number of the sub-data structure, the number of the boundaries being one less than the number of cores on the multi-core processor;

The distribution module is specifically configured to distribute each sub-data structure in the common data structure to a core numbered as a value of the comparison path by a common part in a common data structure.

The device for processing a common data structure according to claim 9, wherein the bottom layer processing module comprises: a resource allocation submodule, configured to allocate a memory resource to a corresponding core on the multi-core processor according to a memory management part in the common data structure, in a case of processing a memory resource of a sub-data structure;

The lock protection submodule is configured to lock and protect the memory resources of the core application processing the sub data structure.

The apparatus for processing a common data structure according to claim 9, wherein the underlying processing module is specifically configured to: according to a memory management part in the common data structure, in a case where a memory space is reserved The corresponding core on the multi-core processor allocates memory resources.

The apparatus for processing a common data structure according to claim 9, wherein the underlying processing module comprises:

Recording sub-modules, located on each core, used to record the amount of used memory in each core memory block, and sort all memory blocks in each core according to the amount of used memory;

The management sub-module is located on each core and is used to apply for memory from the memory partition according to the order of memory usage. In the case of deciphering the memory, the data in the memory block with the lowest usage rate is moved to the memory usage. The highest and no memory block.