CN117076130A

CN117076130A - Method and device for concurrently processing data objects in application program

Info

Publication number: CN117076130A
Application number: CN202311120253.8A
Authority: CN
Inventors: 吴行行; 魏长征; 闫莺; 张辉
Original assignee: Ant Blockchain Technology Shanghai Co Ltd
Current assignee: Ant Blockchain Technology Shanghai Co Ltd
Priority date: 2023-08-31
Filing date: 2023-08-31
Publication date: 2023-11-17

Abstract

A method of concurrently processing data objects in an application, the method being performed by a computing device comprising a processor configured with a plurality of processing cores, the application defining data objects therein that require concurrent processing, the computing device being assigned a plurality of business threads for the application to be used for concurrent processing of the data objects, wherein different business threads run on different processing cores. The method comprises the following steps: according to a first address length of a memory space required to be occupied by a data object, applying for a plurality of memory areas corresponding to a plurality of business threads, and storing the data object in each memory area; for any ith memory area in the multiple memory areas, address information of the ith memory area is configured to an ith service thread in the multiple service threads, so that the ith service thread can perform service processing on data objects stored in the ith memory area according to the address information of the ith memory area.

Description

Method and device for concurrently processing data objects in application program

Technical Field

The embodiment of the specification belongs to the technical field of computers, and particularly relates to a method and a device for concurrently processing data objects in an application program.

Background

The application program can generally adopt OpenMP or other programming models to realize multi-thread programming. The computing device may concurrently perform relevant business processes on particular data objects defined in an application by assigning multiple business threads to the application as directed by the application. If a processor of a computing device includes multiple processing cores and different business threads run on different processing cores, frequent accesses to the same cache line by the multiple business threads may result in frequent execution that negatively affects the efficiency of application execution by swapping cache in and out of the cache line.

Disclosure of Invention

The invention aims to provide a data management method, a data storage system and a computing device.

In a first aspect, there is provided a method of concurrently processing data objects in an application, the method being performed by a computing device including a processor configured with a plurality of processing cores, the application defining data objects therein that require concurrent processing, the computing device being allocated a plurality of business threads for the application to be used for concurrent processing of the data objects, wherein different business threads run on different processing cores, the method comprising: applying for a plurality of memory areas corresponding to the plurality of business threads according to a first address length of a memory space required to be occupied by the data object, and storing the data object in each memory area; and for any ith memory area in the multiple memory areas, configuring the address information of the ith memory area to an ith service thread in the multiple service threads, so that the ith service thread carries out service processing on the data object stored in the ith memory area according to the address information of the ith memory area.

In a second aspect, there is provided an apparatus for concurrently processing data objects in an application, the apparatus being deployed in a computing device, the computing device including a processor configured with a plurality of processing cores, the application defining therein data objects requiring concurrent processing, the computing device being allocated a plurality of business threads for the application to be used for concurrently processing the data objects, wherein different business threads run on different processing cores, the apparatus comprising: the memory management unit is configured to apply for a plurality of memory areas corresponding to the plurality of business threads according to a first address length of a memory space required to be occupied by the data object, and store the data object in each memory area; the address configuration unit is configured to configure address information of an ith memory area in the plurality of memory areas to an ith service thread in the plurality of service threads, so that the ith service thread performs service processing on a data object stored in the ith memory area according to the address information of the ith memory area.

In a third aspect, there is provided a computing device comprising a memory having executable code/instructions stored therein and a processor which, when executing the executable code/instructions, implements the method described in the first aspect.

In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program/instruction which, when executed in a computing device, performs the method described in the first aspect.

In the technical solution provided in the embodiments of the present disclosure, a computing device includes a processor configured with a plurality of processing cores, a data object that needs to be processed concurrently is defined in an application program, and the computing device allocates a plurality of service threads to be used for processing the data object concurrently to the application program, and in a case where different service threads run on different processing cores, the computing device may apply, according to a first address length of a memory space required to be occupied by the data object, a plurality of memory areas corresponding to the plurality of service threads, and store the data object in each memory area; for any ith memory area in the multiple memory areas, address information of the ith memory area is configured to an ith service thread in the multiple service threads, so that the ith service thread performs service processing on data objects stored in the ith memory area according to the address information of the ith memory area. Therefore, for a plurality of business threads running on different processing cores, business processing on the same data object can be completed in different memory areas independently without mutual interference, and the running efficiency of application programs is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a computing device illustratively provided in the practice of the present description;

FIG. 2 is a schematic diagram of an exemplary Cache pseudosharing of a processor in a computing device;

FIG. 3 is a flow chart of a method for concurrently processing data objects in an application provided in an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an exemplary embodiment of a plurality of memory regions allocated to data objects requiring concurrent processing;

FIG. 5 is a schematic diagram of an exemplary embodiment of applying for a plurality of business threads for a plurality of memory regions corresponding thereto;

FIG. 6 is a schematic diagram of an exemplary memory layout of a plurality of memory regions in a memory;

fig. 7 is a schematic diagram of an apparatus method for concurrently processing data objects in an application according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solution in the present specification better understood by those skilled in the art, the technical solution in the embodiments of the present specification will be clearly and completely described in the following with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

FIG. 1 is a schematic diagram of a computing device that is exemplary in the practice of the present description. Referring to fig. 1, the computing device may include a processor and memory. One or more processing cores may be configured in the processor, including, for example, processing core C1 and processing core C2 as shown. Processing cores include, but are not limited to, prefetch logic for fetching instructions, decode logic for decoding instructions, and execution logic for executing instructions, among others. The processor may also include a Cache (Cache) for caching instructions and/or data, such as may include, but not limited to, multiple levels of Cache, such as levels L1, L2, and LLC; wherein different processing cores correspond to different caches. In addition, the processor may include a system agent configured with a memory encryption and decryption engine (Memory Encryption Engine, MEE), the function of which is described in detail below.

Other functional modules may also be included in the computing device, such as a memory controller for supporting processor access to memory. The memory controller may be coupled to a system agent that includes the MEE or may be independent of the processor and memory.

In a hardware-based trusted execution environment (Trusted Execution Environment, TEE) solution, a secure memory area needs to be reserved in the memory of the computing device, e.g., a predetermined number of physical memory pages are reserved as the secure memory area. Taking the secure memory area of SGX technology as an example for illustration: the computing device may create an Enclave as a TEE for executing trusted applications based on SGX technology. The computing device may allocate a partial region EPC (Enclave Page Cache, enclosure page cache or Enclave page cache) in the memory by using a newly added processor instruction in the processor, so as to reside in the Enclave, where the memory region corresponding to the EPC is a secure memory/secure memory region belonging to the TEE.

The MEE in the processor may be used to encrypt and decrypt data exchanged between the processor and the secure memory area. When data (codes and data in the enclaspe) related to the trusted application program is sent from the processor to the secure memory area, the data can be encrypted by an MEE inside the processor to obtain a corresponding ciphertext, and the ciphertext can be written into the secure memory area through the memory controller; only MEE can decrypt the ciphertext in the secure memory area. Thus, the security boundary of Enclave contains only itself and the processor, and neither privileged nor non-privileged software can access Enclave, nor can even the operating system administrator and virtual machine manager (virtual machine monitor, VMM; or hypervisor) affect the code and data in Enclave.

The processor typically reads data from memory (including secure memory regions) to the Cache in blocks of data, which may be commonly referred to as Cache lines, so that the Cache lines are effectively the unit of processor reading data from memory to Cache. For a data object containing multiple elements/units of data, such as an array, the processor may read multiple continuous elements in the data object into the Cache at a time, if the elements in the data object are accessed according to the order of physical memory address distribution, the Cache hit rate is relatively high, so that the frequency of the processor directly reading the data from the memory through the memory controller can be effectively reduced, thereby being beneficial to efficient running of application programs (including trusted application programs).

When a processor includes multiple processing cores (cores), and the same data object defined in an application program is concurrently processed through multiple service threads (threads) running on the multiple processing cores, since different processing cores correspond to different caches, when the multiple processing cores access the data object independently, a problem of Cache pseudo sharing exists, which causes frequent execution to swap in and swap out the Cache Line, thereby negatively affecting the running efficiency of the application program.

FIG. 2 is a schematic diagram of an exemplary Cache pseudosharing of a processor in a computing device. Referring to fig. 2, it is first assumed that a data object X defined in a certain application program includes 26 elements, such as element 1 to element 26, and the 26 elements are sequentially and continuously stored in the memory in the order of the memory addresses from low to high; meanwhile, assume that the address length corresponding to the Cache line read from the memory by the processor is the same as the address length of the memory space required to occupy by 4 consecutive elements. On this basis, the service thread 1 running on the processing core C1 and the service thread 2 running on the processing core C2 can access the data object X stored in the memory independently from each other, so that the Cache lines of the caches corresponding to the processing core C1 and the Cache corresponding to the processing core C2 may contain the same element, for example, cache lines containing elements 1 to 4 are cached. In this case, if the service thread 1 updates the element value of the element 2 cached in its corresponding Cache through the processing core C1, the updated element value of the element 2 needs to be written back to the corresponding location in the memory. Correspondingly, since the element value of the element 2 in the memory is updated, the Cache line containing the element 2 cached in the Cache corresponding to the processing core C2 will be invalidated, and when the service thread 2 needs to access a certain element belonging to the same Cache line as the element 2 through the processing core C2, for example, when the processing core C2 needs to access the element 3, the processor needs to read the element value of the element 3 from the memory for processing under the condition that the element 3 is not updated due to the fact that the Cache line containing the element 3 cached in the Cache corresponding to the processing core C2 is invalidated.

The embodiment of the specification at least provides a method and a device for concurrently processing data objects in an application program. The computing device comprises a processor configured with a plurality of processing cores, a data object needing concurrent processing is defined in an application program, the computing device is distributed with a plurality of business threads for the application program to be used for processing the data object concurrently, and under the condition that different business threads run on different processing cores, the computing device can apply for a plurality of memory areas corresponding to the business threads according to a first address length of a memory space required by the data object, and store the data object in each memory area; for any ith memory area in the multiple memory areas, address information of the ith memory area is configured to an ith service thread in the multiple service threads, so that the ith service thread performs service processing on data objects stored in the ith memory area according to the address information of the ith memory area. Therefore, for a plurality of business threads running on different processing cores, business processing on the same data object can be respectively finished in different memory areas independently without mutual interference, and the running efficiency of application programs is improved.

FIG. 3 is a flow chart of a method for concurrent processing of data objects by an application provided in an embodiment of the present disclosure. The method may be performed by a computing device including a processor configured with a plurality of processing cores. The application may be an untrusted application that needs to run outside the TEE or a trusted application that needs to run inside the TEE.

The application defines a data object X that requires concurrent processing and the number of threads of the plurality of business threads that are to be used to concurrently process the data object X. Wherein, during the initialization of the application program in the computing device, the application program is: the computing device may assign a main thread to the application program such that some or all of the steps in the method shown in fig. 3 are performed in the main thread; in addition, a plurality of business threads for concurrently processing the data object X may be determined according to the number of threads defined in the application program, where the plurality of business threads are to be used for concurrently processing the data object X, and the plurality of business threads may be created by a main thread, or one business thread of the plurality of business threads is a main thread, and the rest of business threads are created by the main thread.

The data object X may be an array containing a plurality of elements, or may be another data structure containing a plurality of unit data.

Referring to fig. 3, the method may include, but is not limited to, the following step S301 and step S303.

Step S301, according to the first address of the memory space occupied by the data object XLength L _x Applying for a plurality of memory areas corresponding to a plurality of service threads, and storing a data object X in each memory area.

For example, see fig. 4: the plurality of business threads to be used for concurrently processing the data object X includes a business thread 1 running on a processing core C1 in the processor and a business thread C2 running on a processing core C2 in the processor. The plurality of memory regions may include a memory region X1 corresponding to a service thread 1 and a memory region X2 corresponding to a service thread 2.

When the application program is a trusted application program, the plurality of memory areas belong to a safe memory area reserved by the computing equipment.

The address length of the single memory area is not less than the first address length L _x Memory addresses which are mutually overlapped do not exist in different memory areas. Wherein the address length of a single memory region may be the second address length L in order to avoid that the cache lines respectively read by the processor for different processing cores contain data contents located at the same memory address as much as possible _c Wherein the second address length L _c Refers to the address length corresponding to a cache line read from memory by a processor, for example, typically 64 bytes.

Exemplary, the address length of a single memory region may be N _c Wherein N represents the length L of the first address _x And a second address length L _c The quotient of (1) is rounded up.

The application of a plurality of memory areas corresponding to a plurality of business threads can be completed through a plurality of memory application operations. That is, each service thread is respectively executed with a memory application operation once, and each service thread is independently applied for a corresponding memory area.

In order to avoid the problem that a large time cost is caused by executing the memory operation for multiple times, a relatively large target memory area can be applied through a single memory application operation, and a plurality of corresponding memory areas are distributed for a plurality of business threads based on the target memory area.

Referring to fig. 5, the application of a plurality of memory areas corresponding to a plurality of service threads may be completed through a single memory operation by some or all of the following steps S3011 to S3015.

Step S3011, according to the first address length L of the occupied memory space required by the data object X _x The number S of the threads of the plurality of business threads and the second address length L corresponding to the cache line read from the memory by the processor _c Calculate the third address length L _s 。

Third address Length L _s Not less than (S-1) N L _c +L _x Wherein N represents the length L of the first address _x And a second address length L _c The quotient of (1) is rounded up. In a more typical example, the third address length L _s The value of (2) may be equal to (S-1) N L _c +L _x Or S.times.N.times.L _c Therefore, resource waste caused by excessive memory application can be avoided.

Step S3013, according to the third address length L _s Applying for a target memory area.

The address length of the target memory area is not less than the third address length L _s For example, it may be a third address length L _s 。

Step S3015, dividing the target memory area into a plurality of memory areas corresponding to a plurality of business threads according to the sequence of memory addresses from low to high; wherein, for any jth memory area not arranged at the end, the fourth address length L of the jth memory area _j Not less than the first address length L _x Fourth address length L _j Is the second address length L _c Is an integer multiple of (a).

Fourth address length L _j The value of (a) is, for example, N x L _c Wherein N represents the length L of the first address _x And a second address length L _c The quotient of (1) is rounded up. For the memory area arranged at the end, the address length of the memory area is not less than the first address length L _x : for example when the third address length L _s The value of (2) is equal to (S-1) N L _c +L _x The address length can be the first address length L _x The method comprises the steps of carrying out a first treatment on the surface of the For another example, when the third address length L _s Is equal to S, N, L _c The address length may be N _c 。

The data object X may be written to each memory region in the order of the memory addresses from low to high. For any jth memory area not arranged at the end, when the fourth address length L of the jth memory area _j Greater than the first address length L _x When the data object X is present, a predetermined placeholder may be written to a memory address in the jth memory region that is not occupied by the data object X. For the memory area arranged at the end, when the address length is greater than the first address length L _x In this case, the predetermined placeholder may or may not be written in the memory address not occupied by the data object X in the memory area.

For example, see fig. 6: the target memory area is divided into a memory area X1 and a memory area X2 in sequence according to the sequence of the memory addresses from low to high, and the address lengths of the memory area X1 and the memory area X2 are larger than the first address length L _x And is the second address length L _c Is an integer multiple of (a). After the data objects X are written in the memory area X1 and the memory area X2 respectively according to the sequence of the memory addresses from low to high, the preset placeholders slots are written in the memory area X1 and the memory area X2 on the rest of the memory addresses which are not occupied by the data objects X, wherein the placeholders are preset characters which cannot be accessed by an application program.

Referring back to fig. 3, in step S303, for any ith memory area in the plurality of memory areas, address information of the ith memory area is configured to an ith service thread in the plurality of service threads, so that the ith service thread performs service processing on the data object stored in the ith memory area according to the address information of the ith memory area.

The address information of the i-th memory area may be represented by the start address of the i-th memory area.

If the multiple memory areas are continuous, that is, the multiple memory areas are obtained by dividing the target memory area of a single application, the address information of the ith memory area may also be represented by combining the starting address of the memory area arranged at the first position and the address offset of the starting address of the ith memory area relative to the starting address of the memory area arranged at the first position.

In the case where the data object is an array comprising a plurality of elements, any ith business thread may be permitted to business process some or all of the plurality of elements, depending on the business logic defined by the application itself.

By way of example, if it is desired to count the occurrence frequency of 26 letters, such as letters a-z, in a document by an application program, the data object X may be an array including 26 elements, which 26 elements sequentially correspond to the 26 letters and are initialized to 0.

In one particular example, when the document needs to be processed with the application, the first k1 lines of the document may be designated for processing by business thread 1 and the remaining k2 lines of the document, excluding the first k1 lines, may be designated for processing by business thread 2. For any kth letter in the 26 letters, the service thread 1 can search for the kth letter in the previous k1 rows in a traversing way, and if 1 kth letter is searched for in each traversing way, the element value of the kth element of the data object X can be subjected to 1 adding operation in the memory area X1 corresponding to the service thread 1; business thread 2 processes 26 elements in memory region X2 in a similar manner to business thread 1. In this case, a single business thread is allowed to access all elements in data object X.

In a specific example, when the document needs to be processed by the application program, the frequency of 13 letters such as letters a to m may be specified to be counted by the service thread 1, and the frequency of 13 letters such as letters n to z may be counted by the service thread 2. For any kth letter in 13 letters such as letters a-m, the service thread 1 can search for the kth letter in the document in a traversing way, and 1 adding operation can be executed on the element value of the kth element in the data object X in the memory area X1 corresponding to the service thread 1 when 1 kth letter is searched for in each traversing way; for any kth letter of 13 letters such as letters n-z, the service thread 2 can search through the kth letter in the document, and 1-adding operation can be performed on the element value of the 13+kth element in the data object X in the memory area X2 corresponding to the service thread 2 when 1 kth letter is searched through each search. In this case, a single business thread is allowed to access a pre-specified portion of the elements in data object X.

After the same data object X is processed in different memory areas, the multiple business threads can be independent and do not interfere with each other, and then the data object X stored in the multiple memory areas can be reduced, so that the preset transaction expected to be completed by the application program can be completed. Wherein the reduction process may vary depending on the specific business scenario; the process of performing reduction processing on the data objects X stored in the plurality of memory areas includes, but is not limited to, at least one of a horizontal reduction mode and a vertical reduction mode. The reduction algorithm corresponding to each reduction mode includes, but is not limited to, addition or multiplication.

Wherein, the transverse reduction corresponds to: after the ith service thread finishes service processing on the data object X stored in the ith memory area, the data object X stored in the ith memory area is reduced to the data object X stored in the target memory area; wherein the target memory region may be one of a plurality of memory regions corresponding to a plurality of business threads. In a more specific example, the data object X includes a plurality of elements, and any mth element in the plurality of elements included in the data object X stored in the ith memory area can be reduced to the mth element included in the data object X stored in the target memory area; for example, if the element value of any mth element in the plurality of elements included in the data object X stored in the ith memory area is v1, and the element value of the mth element in the plurality of elements included in the data object X stored in the target memory area is v2, the element value v2 may be specifically updated to be the sum of the element value v2 and the element value v 1.

The longitudinal reduction mode corresponds to: after the ith service thread finishes service processing on the data object stored in the ith memory area, the rest elements except the nth element in the data object stored in the ith memory area are reduced to the nth element. For example, for a plurality of elements included in the data object X stored in the i-th memory area, a sum operation or a product operation may be performed on the element values of the plurality of elements, and the operation result may be the element value of the 1 st element.

Based on the same conception as the foregoing method embodiments, in this embodiment, there is further provided an apparatus 700 for concurrently processing data objects in an application program, where the apparatus 700 is deployed in a computing device, the computing device includes a processor configured with a plurality of processing cores, the application program defines data objects that need to be concurrently processed therein, and the computing device allocates a plurality of service threads to be used for concurrently processing the data objects for the application program, where different service threads run on different processing cores. Referring to fig. 7, the apparatus 700 includes: the memory management unit 701 is configured to apply for a plurality of memory areas corresponding to the plurality of service threads according to a first address length of a memory space required to be occupied by the data object, and store the data object in each memory area; an address configuration unit 703, configured to, for any ith memory area in the plurality of memory areas, configure address information of the ith memory area to an ith service thread in the plurality of service threads, so that the ith service thread performs service processing on a data object stored in the ith memory area according to the address information of the ith memory area.

Embodiments of the present disclosure also provide a computer readable storage medium having stored thereon a computer program/instruction which, when executed in a computing device, causes the computing device to implement a method for concurrently processing data objects in an application program provided in any of the foregoing method embodiments.

The embodiment of the specification also provides a computing device, which comprises a memory and a processor, wherein the memory stores a computer program/instruction, and the processor executes the computer program/instruction to implement a method for concurrently processing data objects in an application program provided in any one of the foregoing method embodiments.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation device is a server system. Of course, the application does not exclude that as future computer technology advances, the computer implementing the functions of the above-described embodiments may be, for example, a personal computer, a laptop computer, a car-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Although one or more embodiments of the present description provide method operational steps as described in the embodiments or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in an actual device or end product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment) as illustrated by the embodiments or by the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. For example, if first, second, etc. words are used to indicate a name, but not any particular order.

For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when one or more of the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage, graphene storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present specification. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the various embodiments or examples described in this specification and the features of the various embodiments or examples may be combined and combined by those skilled in the art without contradiction.

The foregoing is merely an example of one or more embodiments of the present specification and is not intended to limit the one or more embodiments of the present specification. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present specification, should be included in the scope of the claims.

Claims

1. A method of concurrently processing data objects in an application, the method being performed by a computing device, the computing device including a processor configured with a plurality of processing cores, the application defining data objects therein that require concurrent processing, the computing device being assigned a plurality of business threads for the application to use in concurrently processing the data objects, wherein different business threads run on different processing cores, the method comprising:

applying for a plurality of memory areas corresponding to the plurality of business threads according to a first address length of a memory space required to be occupied by the data object, and storing the data object in each memory area;

and for any ith memory area in the multiple memory areas, configuring the address information of the ith memory area to an ith service thread in the multiple service threads, so that the ith service thread carries out service processing on the data object stored in the ith memory area according to the address information of the ith memory area.

2. The method of claim 1, wherein applying for a plurality of memory areas corresponding to the plurality of service threads according to a first address length of the memory space required by the data object comprises:

calculating a third address length according to a first address length of a memory space occupied by the data object, the number of threads of the plurality of business threads and a second address length corresponding to a cache line read from a memory by the processor;

applying for a target memory area according to the third address length;

dividing the target memory area into a plurality of memory areas corresponding to the business threads according to the sequence of memory addresses from low to high; and for any jth memory area which is not arranged at the end, the fourth address length of the jth memory area is not smaller than the first address length, and the fourth address length is an integer multiple of the second address length.

3. The method of claim 2, wherein storing the data object in each of the memory areas comprises: and writing the data object into the jth memory area according to the sequence from low memory address to high memory address.

4. A method according to claim 3, the method further comprising: and when the fourth address length of the jth memory area is larger than the first address length, writing a predetermined placeholder on a memory address which is not occupied by the data object in the jth memory area.

5. The method of claim 1, the method being performed in a main thread allocated for the application.

6. The method of claim 1, the data object being an array comprising a plurality of elements.

7. The method of claim 6, the ith business thread is permitted to business process some or all of the plurality of elements.

8. The method of claim 1, the application being a trusted application running in a trusted execution environment TEE, the plurality of memory regions belonging to a secure memory region reserved in the computing device.

9. The method of any one of claims 1-8, the method further comprising: and after the ith service thread finishes service processing on the data object stored in the ith memory area, reducing the data object stored in the ith memory area to the data object stored in the target memory area.

10. The method of claim 9, the data object comprising a plurality of elements;

the step of reducing the data object stored in the ith memory area to the data object stored in the target memory area specifically includes: and reducing the mth element included in the data object stored in the ith memory area to the mth element included in the data object stored in the target memory area.

11. An apparatus for concurrently processing data objects in an application, the apparatus deployed in a computing device, the computing device including a processor configured with a plurality of processing cores, the application defining data objects therein that require concurrent processing, the computing device being assigned a plurality of business threads for the application to use in concurrently processing the data objects, wherein different business threads run on different processing cores, the apparatus comprising:

the memory management unit is configured to apply for a plurality of memory areas corresponding to the plurality of business threads according to a first address length of a memory space required to be occupied by the data object, and store the data object in each memory area;

The address configuration unit is configured to configure address information of an ith memory area in the plurality of memory areas to an ith service thread in the plurality of service threads, so that the ith service thread performs service processing on a data object stored in the ith memory area according to the address information of the ith memory area.

12. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-10.

13. A computer readable storage medium having stored thereon a computer program which, when executed in a computing device, performs the method of any of claims 1-10.