CN112395093A

CN112395093A - Multithreading data processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN112395093A
Application number: CN202011402977.8A
Authority: CN
Inventors: 余银; 赵家众; 穆涛
Original assignee: Longxin Zhongke Hefei Technology Co ltd
Current assignee: Longxin Zhongke Hefei Technology Co ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-02-23

Abstract

The application provides a multithreading data processing method, a multithreading data processing device, an electronic device and a readable storage medium, wherein whether the byte number of data to be processed is smaller than or equal to the width of a register is determined, when the byte number of the data to be processed is smaller than or equal to the width of the register, an instruction sequence corresponding to a current thread is generated, the instruction sequence comprises the data to be processed and an atomic operation instruction, and the atomic operation instruction is used for realizing the processing of the data to be processed; and executing an instruction sequence corresponding to the current thread, wherein the data to be processed is data shared by a plurality of threads. In other words, in the embodiment of the present application, when the number of bytes of the to-be-processed data is less than or equal to the width of the register, the to-be-processed data may be written into the instruction sequence, and the read or write operation of the to-be-processed data may be completed through the atomic operation instruction, so that the synchronization of the multi-thread data may be ensured without using a thread lock, and the performance overhead of the processor during the multi-thread data synchronization process is reduced.

Description

Multithreading data processing method and device, electronic equipment and readable storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a multithreading data processing method and device, electronic equipment and a readable storage medium.

Background

The multithreading technology is a technology that can implement concurrent execution of multiple threads from software or hardware, and can effectively improve the resource utilization rate of a Central Processing Unit (CPU) and accelerate the program response speed, and is widely used at present.

But the problem of data synchronization is also brought when the concurrent execution of multiple threads is realized. For example, when thread B accesses and modifies a constant in certain data during the process of accessing the data by thread a, thread a may eventually obtain an erroneous access result.

The current way to solve the above problem is to use a thread lock, that is, in the process of accessing data by thread a, other threads cannot access the data, and only after thread a releases the thread lock, other threads can access the data, thereby ensuring the synchronization of data, but the way of using the thread lock will bring significant performance overhead to the processor.

Disclosure of Invention

Embodiments of the present invention provide a method and an apparatus for processing multithreading data, an electronic device, and a readable storage medium, which can reduce performance overhead of a processor in a multithreading data synchronization process.

In a first aspect, an embodiment of the present invention provides a multithreading data processing method, including:

determining whether the byte number of data to be processed is smaller than or equal to the width of a register, wherein the data to be processed is shared by a plurality of threads;

when the byte number of the data to be processed is smaller than or equal to the width of a register, generating an instruction sequence corresponding to a current thread, wherein the instruction sequence comprises the data to be processed and an atomic operation instruction, and the atomic operation instruction is used for realizing the processing of the data to be processed;

and executing the instruction sequence corresponding to the current thread.

In a possible design, the generating a sequence of instructions corresponding to a current thread includes:

if the current thread belongs to the reading operation, generating a first instruction sequence corresponding to the current thread, wherein the first instruction sequence comprises an atomic loading instruction and the data to be processed;

the executing the instruction sequence corresponding to the current thread includes:

executing the atomic load instruction to write the data to be processed to the register.

In a possible design manner, the data to be processed is address information, and the atomic load instruction includes a load operation code, a register identifier, an address of a currently executing instruction, and an offset;

the executing the atomic load instruction to write the data to be processed to the register comprises:

and acquiring the data to be processed stored at the corresponding position in the instruction sequence by using the loading operation code, the address and the offset, and loading the data to be processed into a register corresponding to the register identifier.

if the current thread belongs to the write-in operation, generating a second instruction sequence corresponding to the current thread, wherein the second instruction sequence comprises an atomicity write-in instruction and the data to be processed;

executing the atomic write instruction to update the data to be processed in the instruction sequence.

In a second aspect, an embodiment of the present invention provides a multithreading data processing apparatus applied to a RISC architecture processor, including:

the determining module is used for determining whether the byte number of the data to be processed is smaller than or equal to the width of the register, wherein the data to be processed is data shared by a plurality of threads;

the processing module is used for generating an instruction sequence corresponding to the current thread when the byte number of the data to be processed is smaller than or equal to the width of a register, wherein the instruction sequence comprises the data to be processed and an atomic operation instruction, and the atomic operation instruction is used for realizing the processing of the data to be processed;

and the execution module is used for executing the instruction sequence corresponding to the current thread.

In one possible design, the processing module is configured to:

the execution module is configured to:

In a possible design manner, the data to be processed is address information, and the atomic load instruction includes a load operation code, a register identifier, an address of a currently executing instruction, and an offset; the execution module is configured to:

In one possible embodiment, the processing module is configured to:

the execution module is configured to:

In a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of multi-threaded data processing as provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for processing multithread data provided in the first aspect is implemented.

The multithreading data processing method, the multithreading data processing device, the electronic equipment and the readable storage medium provided by the embodiment of the application determine whether the byte number of the data to be processed is smaller than or equal to the width of the register, and when the byte number of the data to be processed is smaller than or equal to the width of the register, an instruction sequence corresponding to a current thread is generated, wherein the instruction sequence comprises an atomic operation instruction to realize the processing of the data to be processed; and executing an instruction sequence corresponding to the current thread, wherein the data to be processed is data shared by a plurality of threads. In other words, in the embodiment of the present application, when the number of bytes of the to-be-processed data is less than or equal to the width of the register, the to-be-processed data may be written into the instruction sequence, and the read or write operation of the to-be-processed data is completed through the atomic operation instruction, so that the synchronization of the multi-thread data may be ensured without using a thread lock, and the performance overhead of the processor during the multi-thread data synchronization process is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic hardware structure diagram of an electronic device provided in an embodiment of the present application;

FIG. 2 is a flowchart illustrating a multithreading data processing method according to an embodiment of the present invention;

FIG. 3 is a thread diagram of a multithreading data processing method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of program modules of a multithreading data processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The multithreading data processing method provided by the embodiment of the application can be applied to electronic equipment in various forms, such as a mobile terminal, a computer, a vehicle-mounted terminal, wearable equipment and the like.

Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present application. In the embodiment of the present application, the electronic device 10 includes: a processor 101 and a memory 102; wherein:

memory 102 for storing computer-executable instructions and data.

A processor 101 for executing computer executable instructions stored in the memory, processing data in the memory, and the like.

Alternatively, the memory 102 may be separate or integrated with the processor 101.

When the memory 102 is provided separately, the electronic device further includes a bus 103 for connecting the memory 102 and the processor 101.

Optionally, the Processor 101 may be a Central Processing Unit (CPU), or may be another general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. A general purpose processor may be a microprocessor, any conventional processor, or the like. The steps of the multithreading data processing method disclosed in the present application can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

The Memory may include a Random Access Memory (RAM), a Non-Volatile Memory (NVM), for example, at least one disk Memory, and may also be a usb disk, a removable hard disk, a read-only Memory, a magnetic disk or an optical disk.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

Among other things, a processing system that includes multiple processors and a single processor may include a number of "threads," each of which runs program instructions independently of the other threads. The use of multiple processors allows for multiple tasks or functions, and even multiple applications, to make the processing more efficient and faster. Using multiple threads or processors means that there are two or more processors or threads that can share the same data stored in the system.

The multithreading technology is a technology which can realize the concurrent execution of a plurality of threads from software or hardware, can effectively improve the resource utilization rate of a CPU, and can accelerate the response speed of a program. But the problem of data synchronization is also brought when the concurrent execution of multiple threads is realized. For example, multiple instructions are required for loading a constant into a register by a processor of a Reduced Instruction Set Computer (RISC), and if a thread a accesses a certain data, a thread B also accesses and modifies the constant in the data (such as modifying a function address), which may result in the thread a finally obtaining an erroneous access result. In the prior art, a common way to solve such a problem is to obtain a thread lock before thread B writes and before thread a reads, that is, in the process of thread a accessing data, thread B cannot access the data, and only after thread a releases the thread lock, thread B can access the data, thereby ensuring the synchronization of the data, but the way of using the thread lock brings significant performance overhead to the processor.

In order to solve the above technical problems, the present application provides a multithreading data processing method for the above situation, which can implement multithreading data synchronization without a thread lock, thereby reducing performance overhead of a processor in the multithreading data synchronization process.

Referring to fig. 2, fig. 2 is a first flowchart illustrating a multithread data processing method according to an embodiment of the present invention, in a possible embodiment of the present application, the multithread data processing method includes:

s201, determining whether the byte number of the data to be processed is smaller than or equal to the width of the register, wherein the data to be processed is shared by a plurality of threads.

The data to be processed is data shared by multiple threads, that is, the data to be processed on the same memory address can be processed by multiple threads.

The registers are small storage areas used for storing data in the CPU, and are used for temporarily storing data participating in operation and operation results. Which may be any one of general purpose registers, special purpose registers, and control registers.

In the embodiment of the application, whether the byte number of the data to be processed in the memory is smaller than or equal to the width of the register or not can be predetermined, and if the byte number of the data to be processed in the memory is larger than the width of the register, the thread lock mode can be still used in the thread to ensure that the thread is not interrupted by other threads in the process of reading or changing the data to be processed in the memory; if the byte number of the data to be processed in the memory is less than or equal to the width of the first register, the data to be processed can be read or changed at one time by using the instruction sequence.

In a possible implementation manner, the data to be processed is address information, for example, the data to be processed is a storage address of the target data in the memory.

It will be appreciated that if the word length of the register is 32 bits, then for a 64-bit data variable M, the CPU needs to read the variable M twice (for example, first read the upper 32 bits and then read the lower 32 bits), and the same applies to the writing. Then if a thread lock is not applied, the following problems exist:

for example, when the thread a reads the high 32 bits of the variable M, it only allows the high 32 bits of data not to be modified by other threads, and the thread B can still modify the low 32 bits of the variable M, which may cause the high 32 bits of the variable M read by the thread a to be the original value and the low 32 bits to be the modified value of the thread B, thereby causing dirty data and program error. Therefore, the application needs to ensure that the byte number of the data to be processed in the memory is less than or equal to the width of the register, so that the thread A can read all the data to be processed at one time, and the problem that the data to be processed is modified by the thread B is solved.

S202, when the byte number of the data to be processed is smaller than or equal to the width of the register, generating an instruction sequence corresponding to the current thread, wherein the instruction sequence comprises the data to be processed and an atomic operation instruction, and the atomic operation instruction is used for realizing the processing of the data to be processed.

Once the atomic operation instruction starts to be executed, the atomic operation instruction runs to the end, and is not switched to any thread in the middle, namely the atomic operation instruction is not interrupted by other threads in the running process.

In a possible implementation manner, the data to be processed may be directly written into the instruction sequence when the instruction sequence corresponding to the current thread is generated.

In another possible implementation, the data to be processed may be generated and stored in the instruction sequence corresponding to the address before the current thread executes. When the current thread is executed and a corresponding instruction sequence is generated, the atomic line operation instruction is generated based on the position of the data to be processed in the instruction sequence. For example, in the instruction sequence, 12 other instructions are spaced between the data to be processed and the atomic operation instruction, and the data to be processed is located below the atomic operation instruction, information that the pointer pc +12 points to the data to be processed and is generated based on the information related to the data to be processed inevitably exists in the atomic operation instruction.

And S203, executing the instruction sequence corresponding to the current thread.

When the instruction sequence corresponding to the current thread is executed, the atomic operation instruction in the instruction sequence can realize the function of atomically reading or modifying the data to be processed in the instruction sequence, namely, the atomic operation instruction is not interrupted by other threads in the process of reading or modifying the data to be processed in the instruction sequence, so that the synchronization of multi-thread data can be ensured without adopting a thread lock, and the performance overhead of a processor in the multi-thread data synchronization process is reduced.

For better understanding of the present application, the present embodiment is referred to a MIPS instruction set, and without a thread lock, under a MIPS architecture processor, a thread may dynamically generate the following instruction to load a target address into a register:

from 32to 47 bits of the address byte, lui reg, address _ bit32Tobit47\, are left shifted by 16 bits and stored in register reg

ori reg, reg, address _ bit16Tobit31\ \ register reg and 16to 31 bits in the address byte are OR-ed bitwise and the result is written into the reg register

The contents of the drotr32 reg, reg,16\ \ register reg are shifted to the left by 16 bits and stored in the register reg

ori reg, reg, address _ bit0Tobit15\ \ per bit OR register reg and 0to 15 bits in the address byte, and write the result into register reg

jr reg \\ \ jump to address stored in register to execute corresponding instruction

It will be appreciated that since the address (data to be processed) is dynamically changed, each change will have a thread (thread B) to modify the 1 st, 2 nd and 4 th instructions to load the latest address. And the other thread (thread A) circularly runs the instruction sequence and jumps to the latest function for execution, if the thread A is interrupted before the execution reaches the 2 nd, 3 rd and 4 th and is switched to the thread B for execution, the thread B modifies the three instructions, and thus when the thread A is switched back, the value in the register reg is not the old address or the new address, and the program execution is wrong.

In this embodiment, when the byte number of the address (to-be-processed data) is smaller than or equal to the width of the register, the address may be directly written into the instruction sequence, and the reading or writing operation on the to-be-processed data may be completed by using an atomic operation instruction. Specifically, the thread a may load the address (to-be-processed data) into the register reg at one time through an atomic load instruction, so that it may be ensured that no data asynchronism occurs when the thread a is switched to the thread B whenever the thread a is interrupted in the process of executing the instruction sequence. The thread B can update the data to be processed through an atomicity writing instruction, so that the thread A can not read wrong data in the process of updating the data to be processed by the thread B.

The multithreading data processing method provided by the embodiment of the application can ensure that the to-be-processed data cannot be changed by the thread B in the process of reading the to-be-processed data in the memory by the thread A, so that the thread A can read the to-be-processed data before the thread B is changed or read the to-be-processed data after the thread B is changed, and can not read the to-be-processed data which is changed by the thread B by half and is not changed by the other half.

Similarly, it can be ensured that the data to be processed is not read by the thread a when the thread B changes the data to be processed in the memory. That is, after the thread B starts to change the to-be-processed data, the thread a cannot read the to-be-processed data any more, and can only read the to-be-processed data after the thread B has changed the to-be-processed data, so that the situation that the thread a reads the to-be-processed data in the process of changing the to-be-processed data by the thread B does not exist.

It should be noted that all processors with RISC architecture have such problems, such as RISC-V, Loongarch, ARM, etc., and no specific description is given here, and the method for improving such problems can be implemented by the embodiments described in the embodiments of the present application.

According to the multithreading data processing method provided by the embodiment of the application, when the byte number of the data to be processed is smaller than or equal to the width of the register, the instruction sequence corresponding to the current thread can be generated, and after the data to be processed is written into the instruction sequence, the reading or writing operation of the data to be processed is completed through the atomic operation instruction. Because the atomic operation instruction cannot be interrupted by other threads in the execution process, the synchronization of the multi-thread data can be ensured without adopting a thread lock, and the performance expense of the processor in the multi-thread data synchronization process is reduced.

Based on the content described in the foregoing embodiment, in a possible implementation manner, the generating an instruction sequence corresponding to the current thread in step S202 specifically includes:

and if the current thread belongs to the reading operation, generating a first instruction sequence corresponding to the current thread, wherein the first instruction sequence comprises an atomic loading instruction and the data to be processed.

The executing of the instruction sequence corresponding to the current thread described in step S203 specifically includes:

and executing the atomic load instruction to write the data to be processed into the register.

That is, in the embodiment of the present application, the data to be processed may be directly written into the instruction sequence, and the data to be processed may be loaded into the register at one time by using one atomic load instruction. That is, in the embodiment of the present application, by writing the to-be-processed data into the register by using the atomic load instruction, it can be ensured that the to-be-processed data is not modified by other threads in the process of writing the to-be-processed data into the register.

In a possible implementation manner, the data to be processed is address information, and the atomic load instruction includes a load opcode, a register identifier, an address of an instruction currently being executed, and an offset.

The executing the atomic load instruction to write the data to be processed into the register includes:

In one possible implementation, the load opcode is a load.

That is, in the embodiment of the present application, the data to be processed may be directly written into the instruction sequence, and the data to be processed may be loaded into the register at one time by using one load instruction. By writing the data to be processed into the register by using the load instruction, the data to be processed can be ensured not to be modified by other threads in the process of writing the data to be processed into the register.

And if the current thread belongs to the write-in operation, generating a second instruction sequence corresponding to the current thread, wherein the second instruction sequence comprises an atomicity write-in instruction and the data to be processed.

and executing the atomic write instruction to update the data to be processed in the instruction sequence.

That is, in the embodiment of the present application, the data to be processed may be directly written into the instruction sequence, and the update of the data to be processed may be completed by using one atomic write instruction. By updating the data to be processed by using the atomic write instruction, the data to be processed can be ensured not to be read by other threads in the updating process.

In one possible embodiment, the atomic write instruction includes a store instruction.

That is, in the embodiment of the present application, the data to be processed may be directly written into the instruction sequence, and the update of the data to be processed may be completed by using one store instruction. By updating the data to be processed by using the store instruction, the data to be processed can be ensured not to be read by other threads in the updating process.

For better understanding of the embodiment of the present application, referring to fig. 3, fig. 3 is a schematic thread diagram of a multithread data processing method according to an embodiment of the present invention.

In fig. 3, a first thread is used to execute a first instruction sequence to read data to be processed in a memory and write to a first register; the second instruction is used for executing a second instruction sequence so as to write the data in the second register into the data to be processed in the memory and modify the data to be processed in the memory.

It is understood that a thread is sometimes referred to as a Lightweight Process (LWP), which is the smallest unit of program execution flow, and a standard thread consists of a thread ID, a current instruction Pointer (PC), a register set, and a stack. Namely, the first register and the second register are two different groups of registers distributed in the processor.

The memory is used to temporarily store operation data in the CPU and data exchanged with an external memory such as a hard disk.

In the embodiment of the application, when the number of bytes of the data to be processed is less than or equal to the width of the first register, the first thread can read the data to be processed from the memory at one time by executing the first instruction sequence and write the data to be processed into the first register for processing, and because the load instruction in the first instruction sequence is not interrupted by other threads in the execution process, the first thread can ensure that the data to be processed is not modified by other threads even if a thread lock is not adopted in the process of reading the data to be processed. In addition, when the byte number of the data to be processed in the memory is smaller than or equal to the width of the second register, the second thread can modify the data to be processed in the memory according to the data in the second register by executing the second instruction sequence, and because the store instruction in the second instruction sequence is not interrupted by other threads in the execution process, when the second thread modifies the data to be processed, the data to be processed can be ensured not to be read by other threads even if thread locks are not adopted, so that the performance overhead of the processor in the multi-thread data synchronization process can be reduced.

According to the multithreading data processing method provided by the embodiment of the application, when the byte number of the data to be processed is smaller than or equal to the width of the register, the data to be processed is directly written into the instruction sequence, and the first thread can read the data to be processed into the register for processing at one time by executing the load instruction; or the second thread can write the data in the register into the memory at one time by executing the store instruction, and modify the data to be processed in the memory. Because the load instruction and the store instruction are not interrupted by other threads in the execution process, the data to be processed can not be modified by other threads even if a thread lock is not adopted when the first thread reads the data to be processed, and the data to be processed can not be read by other threads even if the thread lock is not adopted when the second thread modifies the data to be processed, so that the problem of multithread synchronization can be effectively solved, and the performance expense of the processor can be reduced.

Based on the multithreading data processing method described in the foregoing embodiment, an embodiment of the present application further provides a multithreading data processing apparatus, and referring to fig. 4, fig. 4 is a schematic diagram of program modules of the multithreading data processing apparatus provided in an embodiment of the present invention, where, in a possible embodiment of the present application, the multithreading data processing apparatus 40 includes:

a determining module 401, configured to determine whether a byte number of data to be processed is smaller than or equal to a width of the register, where the data to be processed is data shared by multiple threads.

A processing module 402, configured to generate an instruction sequence corresponding to a current thread when a byte number of data to be processed is smaller than or equal to a width of a register, where the instruction sequence includes the data to be processed and an atomic operation instruction, and the atomic operation instruction is used to implement processing on the data to be processed.

The execution module 403 is configured to execute an instruction sequence corresponding to the current thread.

In the multithreading data processing apparatus 40 provided in the embodiment of the present application, when the number of bytes of the to-be-processed data is less than or equal to the width of the register, the reading or writing operation of the to-be-processed data can be completed through the atomic operation instruction, and the atomic operation instruction is not interrupted by other threads in the execution process, so that the synchronization of the multithreading data can be ensured without using a thread lock, and the performance overhead of the processor in the multithreading data synchronization process is reduced.

In a possible implementation, the processing module 402 is specifically configured to:

The execution module 403 is specifically configured to:

In a possible implementation manner, the data to be processed is address information, and the atomic load instruction includes a load opcode, a register identifier, an address of an instruction currently being executed, and an offset; the execution module 403 is configured to:

if the current thread belongs to the write-in operation, generating a second instruction sequence corresponding to the current thread, wherein the second instruction sequence comprises an atomicity write-in instruction and the data to be processed; the execution module 403 is configured to:

It should be understood that the functions and principles implemented by the functional modules in the multithread data processing apparatus 40 are consistent with the functions and principles implemented by the steps in the multithread data processing method described in the foregoing embodiment, and the detailed implementation process may refer to the description in each embodiment corresponding to the multithread data processing method, which is not described herein again.

Based on the content described in the foregoing embodiments, the embodiments of the present invention further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the content described in the foregoing embodiments of the multithread data processing method is implemented.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of multithreaded data processing for use in a RISC architecture processor, the method comprising:

and executing the instruction sequence corresponding to the current thread.

2. The method of claim 1, wherein generating the instruction sequence corresponding to the current thread comprises:

3. The method of claim 2, wherein the data to be processed is address information, and the atomic load instruction comprises a load opcode, a register identification, an address of a currently executing instruction, and an offset;

4. The method of claim 1, wherein generating the instruction sequence corresponding to the current thread comprises:

5. A multithreaded data processing apparatus, for use in a RISC architecture processor, the apparatus comprising:

6. The apparatus of claim 5, wherein the processing module is configured to:

the execution module is configured to:

7. The apparatus of claim 6, wherein the data to be processed is address information, and the atomic load instruction comprises a load opcode, a register identification, an address of a currently executing instruction, and an offset; the execution module is configured to:

8. The apparatus of claim 5, wherein the processing module is configured to:

the execution module is configured to:

9. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

execution of computer-executable instructions stored by the memory by the at least one processor causes the at least one processor to perform a method of multi-threaded data processing according to any of claims 1 to 4.

10. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement a method of multithreaded data processing as recited in any of claims 1-4.