CN113778914A

CN113778914A - Apparatus, method, and computing device for performing data processing

Info

Publication number: CN113778914A
Application number: CN202010526781.3A
Authority: CN
Inventors: 卢廷玉; 郭海涛; 李涛; 俞柏峰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-06-09
Filing date: 2020-06-09
Publication date: 2021-12-10
Also published as: WO2021249029A1

Abstract

The present application relates to an apparatus, method, and computing device for performing data processing. The data processing apparatus provided by the present application includes: a control unit and a buffer unit coupled to the control unit. The cache unit stores a plurality of instructions received by the control unit by using the first queue and the second queue, wherein the plurality of instructions comprise a memory copy instruction and a read-write instruction. Here, the first queue is used to store memory copy instructions and the second queue is used to store read and write instructions. The control unit receives a plurality of instructions and stores the plurality of instructions to the first queue and the second queue, respectively, according to a predetermined rule and a type of the plurality of instructions. The control unit executes instructions stored in the first queue and the second queue in parallel. Therefore, the instructions in the two queues are executed in parallel, and the condition that the read-write instructions are blocked is avoided.

Description

Apparatus, method, and computing device for performing data processing

Technical Field

The present application relates to the field of data processing, and in particular, to an apparatus, method, controller and server for performing data processing.

Background

In a computing device, a copy of memory operation may be used to implement a copy of a contiguous block of data from one address to another address in memory. A read instruction and a write instruction may be used to implement the memory copy operation. However, other types of instructions exist in computing devices (e.g., ordinary read and write instructions) in addition to instructions related to memory copy operations. When a plurality of memory copy operations exist in the computer equipment, instructions related to the memory copy operations and instructions of other types can only be processed one by one and serially according to the sequence, so that the instruction execution process in the whole computer equipment is congested, and the problems of long time consumption and low efficiency exist in the processing process of each instruction. Therefore, how to provide a method for processing instructions in parallel becomes an urgent technical problem to be solved.

Disclosure of Invention

The application provides a device, a method and a computing device for executing data processing, so as to provide a technical scheme of parallel instruction processing.

In a first aspect, a data processing apparatus is provided. The apparatus includes a control unit and a cache unit coupled to the control unit. The control unit receives a plurality of instructions and stores the plurality of instructions using a first queue and a second queue in the cache unit. Here, the plurality of instructions include a memory copy instruction and a read/write instruction. The control unit stores the plurality of instructions to the first queue and the second queue, respectively, according to a predetermined rule and the types of the plurality of instructions. Specifically, the first queue is used to store memory copy instructions and the second queue is used to store read and write instructions. Further, the control unit executes the instructions stored in the first queue and the second queue in parallel. Here, the first queue and the second queue are two types of queues, and with the above-described apparatus, instructions in the two queues can be executed in parallel based on the first queue and the second queue. Therefore, the two types of queues are independently scheduled, and the memory copy instruction and the read-write instruction are executed in parallel, so that the situation that the read-write instruction is blocked is avoided.

In one possible implementation, the predetermined rules include address dependencies between the plurality of instructions. The address dependency is determined based on addresses carried in the plurality of instructions. Whether the memory copy instruction and the read-write instruction in the plurality of instructions access the same page or not can be judged so as to determine whether an address dependency relationship exists or not. The device can store the instructions without address dependency relationship in a plurality of instructions into the first queue and the second queue respectively so as to execute the instructions in parallel.

In another possible implementation manner, it may be determined whether a memory copy instruction and a read-write instruction in the multiple instructions access the same page, so as to determine whether an address dependency relationship exists between the memory copy instruction and the read-write instruction. And if the memory copy instruction and the read-write instruction access the same page, determining that the memory copy instruction and the read-write instruction have an address dependency relationship. And if the memory copy instruction and the read-write instruction do not access the same page, determining that the memory copy instruction and the read-write instruction do not have an address dependency relationship. With the above apparatus, it is not necessary to determine whether the addresses of two instructions overlap, but only whether the two instructions relate to the same page. Thereby avoiding the high computational load caused by directly judging whether the addresses are overlapped. Further, by adjusting the size of the page, the accuracy of the judgment and the calculation amount can be balanced, so that the calculation amount is reduced as much as possible under the condition of ensuring the accuracy.

In another possible implementation manner, in order to determine whether two instructions relate to the same page, a read counter and a write counter may be respectively set for each page in the memory. Then, whether the memory copy instruction and the read-write instruction access the same page can be determined according to the read counter and the write counter. By utilizing the read counter and the write counter, whether each page is read and written at the same time can be effectively monitored. This enables detection of data collisions and thus determination of address dependencies in a simple and efficient manner.

In another possible implementation, when an instruction accesses a page, the values of the two counters of the page may be updated. Specifically, if it is determined that an instruction of the plurality of instructions is to read a page, incrementing a value of a read counter; if it is determined that an instruction of the plurality of instructions has read a page, decreasing the value of the read counter; if it is determined that an instruction of the plurality of instructions is to be written to the page, incrementing a write counter; and decreasing the value of the write counter if it is determined that an instruction of the plurality of instructions has written to the page. Based on the updated values of the read counter and the write counter, the read and/or write state of the page can be recorded, and the address dependency relationship of the two instructions can be further determined.

In another possible implementation, if the value of the write counter is different from the initial value of the write counter, it is determined that the memory copy instruction and the read/write instruction access the same page. According to the data processing apparatus of the present application, if the value of the write counter is different from the initial value, this indicates that the page is to be written. The written data may affect the reading of another instruction and may also affect the writing of another instruction, which may cause the address dependency between the memory copy instruction and the read/write instruction. In this way, a comparison can be made to determine whether the memory copy instruction and the read/write instruction have an address dependency.

In another possible implementation manner, a format of a memory copy instruction is provided, where the memory copy instruction includes: source address, destination address, and copy length. Here, the memory copy instruction denotes copying data at a source address to a destination address based on a memory copy length. With the exemplary implementation of the present application, a copy operation that would otherwise need to be performed by two instructions (one read and one write) can be performed based on a single memory copy instruction. At this time, the processing unit is called only once when the memory copy operation is performed, so that the resource overhead of the processing unit can be reduced, and the transmission bandwidth of data to and from the processing unit and the memory can be reduced.

In another possible implementation, the memory copy instruction further includes a cache tag to indicate that the data is to be loaded to a cache associated with the memory. With the cache tag in the present application, data that is likely to be accessed in the future can be preloaded into the cache. Data access efficiency in the computing device is improved, and overall performance of data processing is improved.

In another possible implementation, when the copy length is too long, the possibility that the memory copy instruction and the read/write instruction have an address dependency may be increased. Thus, if it is determined that the copy length of the memory copy instruction exceeds a predetermined threshold, the memory copy instruction may be divided into a plurality of sub-instructions. Further, a plurality of sub-instructions may be stored to the first queue. By using the data processing device, the possibility that the address dependency relationship is frequently detected between the memory copy instruction and the read-write instruction can be reduced by dividing the memory copy instruction with the increased copy length into a plurality of sub-instructions. Therefore, the parallelism of executing the memory copy instruction and the read-write instruction is improved, and the performance of data processing is further improved.

In another possible implementation, the concept of direct read and write instructions is presented. The direct read/write instruction has no address dependency with any other memory copy instruction in the plurality of instructions and can be preferentially executed. Specifically, the direct read-write instruction may be detected among read-write instructions among the plurality of instructions based on the address dependency. Further, direct read and write instructions may be executed immediately. By using the data processing device, the direct read-write instruction which has no address dependency relation with other memory copy instructions can be preferentially executed. Therefore, the situation that the common read-write instruction is blocked in the traditional scheme is relieved, and the response speed of the read-write instruction is improved.

In another possible implementation, the read-write command for accessing the memory may be generated based on direct read-write, and the read-write result may be received from the memory. Further, the results may be read and written to the processing unit. By utilizing the data processing device, the result of executing the direct read-write instruction can be obtained at a higher speed, and the processing efficiency of the read-write instruction is further improved.

In another possible implementation, memory copy instructions that do not have an address dependency with other read and write instructions may be processed preferentially. Specifically, whether the memory copy instruction and a read-write instruction in the plurality of instructions have an address dependency relationship may be determined. And if the memory copy instruction is determined not to have an address dependency relationship with any read-write instruction in the plurality of instructions, storing the memory copy instruction to a first queue. The data processing device can determine whether the memory copy instruction and other read-write instructions have address dependency relationship only through simple comparison. The memory copy instruction which does not have address dependency relationship with other read-write instructions can be quickly and effectively stored in the first queue. After the memory copy instructions are stored to the first queue, the memory copy instructions in the first queue may be executed in sequence. In this way, the execution of the memory copy instruction is independent of the execution of the normal read and write instructions in the second queue, thereby avoiding a situation where the normal read and write instructions are blocked.

In another possible implementation, since the memory copy instruction includes a source address and a destination address, the two addresses should be considered in determining the address dependency. And if the fact that the pages related to the source address and the destination address of the memory copy instruction are different from the pages related to the read-write instruction is determined, it is determined that the memory copy instruction and the read-write instruction do not have an address dependency relationship. Memory copy instructions that are not dependent on other read and write instructions can thus be found in a convenient and efficient manner.

In another possible implementation manner, when the memory copy instruction and the read-write instruction have an address dependency relationship, it is further required to determine a timing relationship in which the memory copy instruction and the read-write instruction are received. At this time, the predetermined rule includes an address dependency relationship and a timing relationship. If the memory copy instruction and the read-write instruction are determined to have the address dependency relationship, the memory copy instruction and the read-write instruction can be stored in the first queue and the second queue respectively based on the address dependency relationship and the time sequence relationship. The timing relationship indicates the original temporal order in which the memory copy instructions and the read and write instructions should be executed. The memory copy instruction and the read-write instruction are stored in the corresponding queues based on the time sequence relation, so that the result of executing the plurality of instructions in parallel based on the two queues can be ensured to be consistent with the result of executing the plurality of instructions in series according to the original time sequence.

In another possible implementation, it is assumed that the timing relationship indicates that the memory copy instruction is received before the read/write instruction and the address dependency relationship indicates that the destination address of the memory copy instruction and the address of the read/write instruction access the same page. At this time, the read/write instruction needs to be stored in the second queue after the memory copy instruction has been executed. By preferentially executing the memory copy instruction, the address dependency of the memory copy instruction and the read-write instruction can be eliminated. In this manner, after the memory copy instruction has been successfully executed, the address dependencies of the memory copy instruction and the read and write instructions no longer exist. It is thus ensured that the correct results are obtained when executing the respective instructions in the two queues in parallel.

In another possible implementation, it is assumed that the timing relationship indicates that the memory copy instruction is received before the write instruction and the address dependency indicates that the source address of the memory copy instruction accesses the same page as the address of the write instruction. At this point, the write instruction needs to be stored to the second queue after the memory copy instruction has been executed. By preferentially executing the memory copy instruction, the address dependency of the memory copy instruction and the write instruction can be eliminated. In this manner, after the memory copy instruction has been successfully executed, the address dependencies of the memory copy instruction and the write instruction no longer exist. It is thus ensured that the correct results are obtained when executing the respective instructions in the two queues in parallel.

In another possible implementation, it is assumed that the timing relationship indicates that the memory copy instruction is received after the read/write instruction, and the address dependency relationship indicates that the destination address of the memory copy instruction and the address of the read/write instruction access the same page. At this time, the memory copy instruction needs to be stored in the first queue after the read/write instruction has been executed. By preferentially executing the read-write instruction, the address dependency of the memory copy instruction and the read-write instruction can be eliminated. After the read-write instruction has been successfully executed, the address dependency of the memory copy instruction and the read-write instruction no longer exists. It is thus ensured that the correct results are obtained when executing the respective instructions in the two queues in parallel.

In another possible implementation, it is assumed that the timing relationship indicates that the memory copy instruction is received after the write instruction and the address dependency indicates that the source address of the memory copy instruction accesses the same page as the address of the write instruction. At this time, it is necessary to store the memory copy instruction to the first queue after the write instruction has been executed. By preferentially executing the write instruction, the address dependency of the memory copy instruction and the write instruction can be eliminated. In this manner, the address dependency of the memory copy instruction and the write instruction no longer exists after the write instruction has been successfully executed. It is thus ensured that the correct results are obtained when executing the respective instructions in the two queues in parallel.

In another possible implementation, the memory copy instructions in the first queue and the read and write instructions in the second queue do not have an address dependency, and thus the instructions in the two queues can be executed in parallel. Specifically, the at least one memory copy instruction may be executed in the order of the at least one memory copy instruction in the first queue. In parallel with executing the at least one memory copy instruction, the at least one read-write instruction may be executed in an order of the at least one read-write instruction in the second queue. The plurality of instructions are respectively stored in the corresponding queues, so that the condition that the read-write instructions are blocked by the memory copy instructions can be avoided. Further, since the instructions in the first queue and the second queue can be executed in parallel, the execution efficiency of the instructions accessing the memory can be improved. In this way, the degree of parallel processing of multiple instructions from the processing unit may be increased, thereby increasing the overall performance of the data processing.

In a second aspect, a data processing method is provided. The method may be performed by a data processing apparatus, and in particular, a plurality of instructions including a memory copy instruction and a read-write instruction may be received by the data processing apparatus. The plurality of instructions are stored in the first queue and the second queue, respectively, according to a predetermined rule and the types of the plurality of instructions. Here, the first queue is used to store memory copy instructions and the second queue is used to store read and write instructions. The instructions stored in the first queue and the second queue are then executed in parallel.

As a possible implementation form, a processor is provided for performing the functions implemented by the data processing apparatus in the first aspect or any one of the possible implementation forms of the first aspect. As a possible implementation, a processor may be provided for performing the operational steps of the method of the second aspect described above.

In a third aspect, a memory manager is provided, where the memory manager includes a data processing apparatus, and the data processing apparatus is configured to implement the functions implemented by the data processing apparatus in the first aspect or any one of the possible implementation manners of the first aspect.

In a fourth aspect, a data processing apparatus is provided. The device includes: the processor includes a processor interface unit, an enqueue unit coupled to the interface unit, a cache unit coupled to the enqueue unit, and an instruction dispatch unit coupled to the cache unit. Specifically, a processor interface unit receives a plurality of instructions including a memory copy instruction and a read/write instruction. The enqueuing unit stores the plurality of instructions to a first queue and a second queue in the buffer unit respectively according to a predetermined rule and the types of the plurality of instructions. Here, the first queue is used to store memory copy instructions and the second queue is used to store read and write instructions. The buffer unit stores a plurality of instructions using a first queue and a second queue, and the instruction scheduling unit executes the instructions stored in the first queue and the second queue in parallel.

In a fifth aspect, a computing device is provided. The computing device includes: a processing unit and a memory manager unit. The memory manager unit herein may execute the data processing method described in any one of the possible implementations of the first aspect and the first aspect, and any one of the possible implementations of the second aspect and the second aspect.

In a sixth aspect, a computer-readable storage medium is provided, having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above aspects.

In a seventh aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.

The present application can further combine to provide more implementations on the basis of the implementations provided by the above aspects.

Drawings

FIG. 1 is a schematic block diagram of a data processing apparatus provided herein;

FIG. 2 is a block diagram of a process for data processing in an exemplary implementation of the present application;

FIG. 3 is a flow chart of a method for data processing of an exemplary implementation of the present application;

FIG. 4 is a block diagram of the structure of a memory copy instruction of an exemplary implementation of the present application;

FIG. 5 is a block diagram of a process for determining address dependencies in an exemplary implementation of the present application;

FIG. 6 is a flow diagram of a method for storing a plurality of instructions to respective queues, respectively, in an exemplary implementation of the present application;

FIG. 7 is a block diagram of a hardware configuration of a data processing apparatus of an exemplary implementation of the present application; and

FIG. 8 is a block diagram of an exemplary implementation of the present application for deploying a data processing apparatus in a computing device.

Detailed Description

Preferred implementations of the present application will be described in more detail below with reference to the accompanying drawings.

The application provides a data processing device which is used for realizing parallel processing of instructions so as to improve instruction processing efficiency and reduce instruction processing time. In the following, an outline of an exemplary implementation according to the present application will be described with reference to fig. 1, where fig. 1 is a schematic structural diagram 100 of a data processing apparatus provided in the present application. As shown in fig. 1, a data processing apparatus 110 is provided, the data processing apparatus 110 being connectable between a processing unit 140 and a memory (including memory 142, … …, and memory 144). The processing unit 140 sends a plurality of instructions to the data processing apparatus 110, and the data processing apparatus 110 comprises a control unit 120 and a cache unit 130 coupled to the control unit 120. Here, the buffer unit 130 is configured to store a plurality of instructions received by the control unit 120 using a first queue 132 and a second queue 134.

It should be understood that the processing unit 140 may be a Central Processing Unit (CPU), and the processing unit 140 may also be other general purpose processors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. Similarly, the control unit 120 also has a similar physical form as the processing unit. The cache unit 130 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct bus RAM (DR RAM).

According to an exemplary implementation of the present application, a concept of a memory copy instruction is presented. The memory copy instruction herein refers to an operation of copying data from a source address to a destination address in a memory by using one instruction. A new memory copy instruction may be added to the instruction set of processing unit 140. At this time, the control unit 120 receives a plurality of instructions, and the plurality of instructions include a memory copy instruction and a read/write instruction. The memory copy instruction is an instruction provided according to an exemplary implementation manner of the present application, and the read/write instruction refers to an original ordinary read/write instruction in the computing device. The control unit 120 stores the received plurality of instructions to the first queue 132 and the second queue 134, respectively. The first queue 132 and the second queue 134 herein are two types of queues and are used for pure memory copy instructions and read and write instructions, respectively. Based on the first queue 132 and the second queue 134, instructions in both queues may be executed in parallel. In this way, the two queues are independently invoked to execute the memory copy instruction and the read/write instruction in parallel, thereby avoiding the situation where the read/write instruction is blocked.

The first queue and the second queue are respectively used for storing the memory copy instruction and other types of instructions except the memory copy instruction. The queue depth of the first queue and the second queue is not limited in the present application. In a specific implementation, the first queue and the second queue may be set to the same or different queue depths, and the queue depth of each queue may also be dynamically adjusted according to the service requirement. On the other hand, the number of the first queue and the second queue does not limit the technical solution claimed in the present application. In a specific implementation, the number of the first queue and the second queue can be set and adjusted according to business requirements.

In the following, further details of an exemplary implementation according to the present application will be described with reference to fig. 2. Fig. 2 is a block diagram 200 of a process of data processing provided herein. As shown, the data processing apparatus may receive memory copy instructions 210 and read-write instructions 220 from processing unit 140, and may set predetermined rules 230. Further, the memory copy instruction 210 may be stored to the first queue 132 and the read-write instruction 220 may be stored to the second queue 134 based on the type of the plurality of instructions and the predetermined rule 230. According to an exemplary implementation of the present application, the data processing apparatus 110 may perform a method for data processing to execute a plurality of received instructions in parallel. In the following, further details of an exemplary implementation according to the present application will be described with reference to fig. 3.

Fig. 3 is a flow chart of a method 300 of data processing provided herein. The method 300 may be performed by the data processing apparatus 110. At block 310, a plurality of instructions are received by the data processing apparatus 110 from the processing unit 140. The plurality of instructions herein include the following types: memory copy instructions 210 and read-write instructions 220. The format of the memory copy instruction according to an exemplary implementation of the present application is described with reference to fig. 4, which is a block diagram 400 of the format of the memory copy instruction 210 of an exemplary implementation of the present application. As shown in the solid line box in fig. 4, the memory copy instruction 210 may include the following components: identifier 410, source address 420, destination address 430, and copy length 440.

Where the identifier 410 is used to indicate the type of instruction. The memory copy instruction 210 indicated by the identifier 410 may be added to the instruction set of the processing unit 140. It will be appreciated that the processing unit 140 may be based on different architectures. Thus, different instruction formats may be designed based on the architecture of processing unit 140. Specifically, in a processing unit based on the X86 architecture, the memory copy instruction 210 may be represented by an identifier; in a processing unit based on the Advanced reduced instruction set Machine (Advanced RISC Machine, ARM) architecture, the memory copy instruction 210 may be represented by another identifier. In the memory copy instruction 210, the source address 420 indicates an address where data to be copied is located, the destination address 430 indicates an address of a destination for storing the data to be copied, and the copy length 440 indicates a length of the data to be copied. Based on the instruction format shown in FIG. 4, data may be copied from the source address 420 to the destination address 430 in memory based on the copy length 440.

The present application may perform a copy operation that would otherwise need to be performed by two instructions (i.e., a read instruction and a write instruction) based on a single memory copy instruction 210. At this time, the processing unit 140 is called only once by the memory copy instruction 210, so that the resource overhead of the processing unit 140 can be reduced, and the workload of the processing unit 140 can be reduced. Further, the data to be copied is written directly from the source address 420 to the destination address 430, and does not need to be transferred between the processing unit 140 and the memory 142. In this way, the bandwidth requirements of the transmission path may be reduced.

With continued reference to fig. 4, the memory copy instruction 210 may also include a cache tag 450 indicating whether to load data from the source address 420 in the memory 142 into a cache associated with the memory 142 when the memory copy instruction 210 is executed. The computing device may include a plurality of speed level storage devices: a cache with a higher access speed and a memory 142 with a lower access speed. Typically, processing unit 140 loads data into cache that needs to be accessed frequently. According to an exemplary implementation of the present application, data may be loaded into cache if cache tag 450 is set, otherwise no load operation is performed. Typically, the copied data is often called after the memory copy instruction 210 is executed. When data is called, the processing unit 140 first searches whether data to be accessed exists in the cache, and if so, directly reads the data; if not, data is read from the memory 142 having a lower access speed. With the exemplary implementation of the present application, cache tags 450 may be used to preload data into a cache that is likely to be accessed in the future. In this way, data access efficiency in the cache may be improved, thereby improving overall performance of data processing.

Having described the details regarding receiving a plurality of instructions above, it will be described below how to process the received plurality of instructions, returning to fig. 3. At block 320 of FIG. 3, the plurality of instructions are stored to the first queue 132 and the second queue 134, respectively, according to the predetermined rule 230 and the type of the plurality of instructions. Here, the steps shown at block 320 may be performed by the data processing apparatus 110. According to an exemplary implementation of the present application, memory copy instructions 210 are stored to first queue 132, and read and write instructions 220 are stored to second queue 134.

The predetermined rule 230 may include an address dependency relationship among the plurality of instructions, and the address dependency relationship is determined according to an address carried in the plurality of instructions. For the memory copy instruction 210, the instruction may carry two addresses, a source address 420 and a destination address 430. The read-write instruction 220 carries only one address: the read instruction may carry an address of the data to be read and the write instruction may carry an address of the data to be written. For convenience of description, the above-mentioned address of the read/write instruction may be collectively referred to as an address of the read/write instruction. The present application may store instructions of the plurality of instructions that do not have address dependencies to the first queue 132 and the second queue 134, respectively, for execution of the instructions in parallel.

In the following, how to determine the address dependency from the addresses carried in the plurality of instructions is described with reference to fig. 5. FIG. 5 is a block diagram 500 of a process for determining address dependencies in an exemplary implementation of the present application. It will be understood that fig. 5 only describes the address of the read/write instruction 220 as an example, and operations related to the source address 420 and the destination address 430 of the memory copy instruction 210 are similar, and thus are not described in detail.

Since two queues are used to store the memory copy instruction 210 and the read/write instruction 220, only the address dependency relationship between the memory copy instruction 210 and the read/write instruction 220 needs to be determined. If multiple memory copy instructions are received, they may be stored to the first queue 132 in the order in which they were received. Similarly, the received multiple read and write instructions may also be stored in the second queue 134 in chronological order. It will be appreciated that a direct determination of whether the addresses of two instructions overlap will result in a higher computational effort. According to an exemplary implementation of the present application, whether two instructions have an address dependency may be determined according to whether the two instructions access the same page. With the exemplary implementation of the present application, it is not necessary to determine whether the addresses of two instructions overlap, but rather only whether the two instructions refer to the same page. In this way, a higher computational effort resulting from directly determining whether addresses overlap can be avoided.

As shown in FIG. 5, the address 510 of the read/write instruction 220 may point to data 530 in the memory 142, and the data 530 is located in a page 520 in the memory 142. It will be appreciated that here page 520 is a range of addresses in memory 142. According to an exemplary implementation of the present application, pages 520 may have different sizes. The larger the size of page 520, the coarser the granularity of the address dependencies are judged to be. At this point, more instructions will be determined to have address dependencies. The smaller the size of page 520, the finer the granularity of the address dependencies are determined to be. This results in a higher calculation load. With the exemplary implementation of the present application, a balance can be made between the two aspects described above in order to select a suitable size. By adjusting the size of the page 520, a balance can be struck between the accuracy of the determination and the amount of computation to minimize the amount of computation while ensuring accuracy.

Hereinafter, determining whether there is an address dependency relationship between the memory copy instruction 210 and the read-write instruction 220 will be described. If the memory copy instruction 210 and the read/write instruction 220 access the same page, it may be determined that the memory copy instruction 210 and the read/write instruction 220 have an address dependency relationship. If the memory copy instruction 210 and the read/write instruction 220 do not access the same page, it may be determined that the memory copy instruction 210 and the read/write instruction 220 do not have an address dependency relationship.

It will be appreciated that although fig. 5 schematically illustrates only one page 520 of the memory 142, a plurality of other pages may also be included in the memory 142. According to an exemplary implementation of the present application, a read counter 532 and a write counter 534 may be separately provided for each page. The read counter 532 herein may indicate whether the page is being read, and the write counter 534 may indicate whether the page is being written. Further, based on the read counter 532 and the write counter 534, it can be determined whether the memory copy instruction 210 and the read/write instruction 220 access the same page.

When the memory copy instruction 210 and the read/write instruction 220 access the same page, a data conflict may occur and special processing is required. Using read counters 532 and write counters 534, page 520 may be effectively monitored for simultaneous reading and writing using the exemplary implementations of the present application. In this way, data collisions and thus address dependencies can be determined in a simple and efficient manner.

According to an exemplary implementation of the present application, the value of the read counter 532 may be incremented if it is determined that the page 520 is to be read. If it is determined that page 520 has been read, the value of read counter 532 may be decreased. Write counter 534 may perform similar operations. Specifically, if it is determined that page 520 is to be written, the value of write counter 534 may be incremented. If it is determined that page 520 has been written, the value of write counter 534 may be decreased. With the exemplary implementation of the present application, the status of the page 520 being read and/or written to is recorded based on updating the values of the read counter 532 and the write counter 534. In this manner, determination of the address dependencies of two instructions may be facilitated. According to an exemplary implementation of the present application, if the value of the write counter 534 is different from the initial value, this indicates that the page 520 is to be written. At this time, the written data may affect the reading of another instruction and may also affect the writing of another instruction, which may cause the address dependency between the memory copy instruction 210 and the read/write instruction 220.

In one example, assuming that the read-write instruction 220 is a write instruction, the source address 420 of the memory copy instruction 210 and the address of the write instruction point to the same page 520, and the initial values of the read counter 532 and the write counter 534 are both 0. Read counter 532 and write counter 534 may be updated to 1, respectively, and at this time, there is an address dependency between memory copy instruction 210 and read/write instruction 220. In another example, assume that the destination address 430 of the memory copy instruction 210 and the address of the write instruction point to the same page 520, and that the initial values of the read counter 532 and the write counter 534 are both 0. Write counter 534 may be updated to 2, at which time memory copy instruction 210 and read/write instruction 220 have an address dependency. By comparing the numerical value of the counter with the initial value, the address dependency relationship of the two instructions can be determined. In this way, the efficiency of determining address dependencies may be improved, thereby improving the overall performance of data processing.

Next, a specific process of storing a plurality of instructions to the first queue 132 and the second queue 134, respectively, based on address dependencies will be described with reference to fig. 6. FIG. 6 is a flow chart of a method 600 for storing a plurality of instructions in respective queues according to the present application. At block 610, instructions may be received, where the instructions may include memory copy instructions 210 or read-write instructions 220.

Generally, the read data generally needs to be processed immediately, and thus the read instruction in the read/write instruction 220 has a high sensitivity to delay. In order to improve the response speed to a read instruction, the concept of a direct read-write instruction is proposed. According to an exemplary implementation of the present application, a direct read/write instruction may be detected in the read/write instruction 220 based on the address dependency. The direct read-write instruction and any other memory copy instruction in the plurality of instructions have no address dependency relationship. In other words, since the page accessed by the direct read/write instruction is different from the page accessed by any other memory copy instruction, the execution of the direct read/write instruction does not affect any other memory copy instruction. Direct read and write instructions can be executed preferentially. With the exemplary implementation of the present application, a direct read-write instruction that does not have an address dependency relationship with other instructions may be preferentially executed. In this way, the situation that the read-write instruction is blocked in the prior art scheme can be relieved, and the response speed of the read-write instruction is improved.

It will be appreciated that typically the write instructions in the read and write instructions 220 do not need to be responded to immediately, but rather the data to be written may be placed in the intermediate storage space, and the data to be written may be written to the corresponding address of the memory 142 at predetermined time intervals or in accordance with other triggering conditions. Although the write instruction need not be executed immediately due to the speed of response requirements, the presence of too many instructions in the second queue 134 may result in overflow of the second queue 134 and/or other complex conditions. With the exemplary implementation of the present application, a computing system can be kept in a good state by executing a direct read-write instruction without address dependency as early as possible.

According to an exemplary implementation of the present application, the direct read and write instructions do not have to be stored to the second queue 134, but may be executed immediately. If the received instruction is a read-write instruction and the read-write instruction has no address dependencies with any of the memory copy instructions, as shown at block 620, the method 600 proceeds to block 622 for direct execution of the direct read-write instruction. Specifically, when the direct read-write instruction is a read instruction, a read command corresponding to the read instruction may be generated and read data may be received from the memory 142. When the direct read and write instruction is a write instruction, a write command corresponding to the write instruction may be generated and data may be written to the memory 142 and a response to the write command may be received from the memory 142. Direct read and write instructions that do not have an address dependency with other memory copy instructions may be preferentially executed. In this way, the situation that the read-write instruction is blocked can be relieved, and the time delay of the read-write instruction can be reduced.

According to an exemplary implementation of the present application, each of the memory copy instructions of the plurality of instructions may be processed sequentially to store the memory copy instruction to the first queue 132. Hereinafter, only the memory copy instruction 210 will be described as an example. As shown at block 630, if the received instruction is a memory copy instruction 210 and the memory copy instruction 210 does not have an address dependency with any of the read and write instructions, the method 600 proceeds to block 632 to store the memory copy instruction 210 to the first queue 132. Specifically, it may be determined whether the memory copy instruction 210 and the read/write instruction in the plurality of instructions have an address dependency relationship in the manner described above. If it is determined that the memory copy instruction 210 does not have an address dependency with any of the read and write instructions of the plurality of instructions, the memory copy instruction 210 may be stored directly to the first queue 132. In one example, assuming that the source address 420 and the destination address 430 of the memory copy instruction 210 both refer to pages that are different from the pages to which the address of any read-write instruction refers, there is no address dependency between the memory copy instruction 210 and any read-write instruction. At this point, the memory copy instruction 210 may be stored to the first queue 132.

With the exemplary implementation of the present application, it can be determined whether there is an address dependency between the memory copy instruction 210 and other read/write instructions only by simple comparison. Memory copy instructions 210 that are independent of other read and write instructions may be quickly and efficiently stored to the first queue 132. After the memory copy instructions 210 are stored in the first queue 132, the memory copy instructions in the first queue 132 may be executed in sequence. In this way, the execution of the memory copy instruction 210 is independent of the execution of the normal read-write instruction 220, thereby avoiding a situation where the normal read-write instruction 220 is blocked.

According to the exemplary implementation manner of the present application, when the memory copy instruction 210 and the read/write instruction 220 have an address dependency relationship, it is further required to determine a timing relationship in which the memory copy instruction 210 and the read/write instruction 220 are received. At this time, the predetermined rule includes an address dependency relationship and a timing relationship. Assuming that the source address 420 or the destination address 430 of the memory copy instruction 210 and the address of a read/write instruction relate to the same page, the memory copy instruction 210 and the read/write instruction have an address dependency relationship. Thus, the timing relationship of the memory copy instruction 210 and the read/write instruction 220 needs to be further examined to determine how to store the two instructions to the respective queues. The timing relationship indicates the original temporal order in which the memory copy instruction 210 and the read-write instruction 220 should be executed. Storing the memory copy instruction 210 and the read/write instruction 220 to the respective queues based on the timing relationship can ensure that the result of executing a plurality of instructions in parallel based on the two queues is consistent with the result when executing a plurality of instructions in series in the original time order.

According to an example implementation of the present application, if the read/write instruction 220 and the memory copy instruction 210 have an address dependency relationship, then as shown at block 640, the timing relationship at which the read/write instruction 220 and the memory copy instruction 210 are received may be determined. It will be appreciated that the timing relationship indicates the order in which the two instructions are received, and thus when storing the memory copy instruction 210 and the read-write instruction 220 into respective queues, the results of executing the instructions in the two queues in parallel should be made consistent with the results of executing the two instructions sequentially in the timing relationship. Further, as shown in block 650, the memory copy instruction 210 and the read/write instruction 220 may be stored to the first queue 132 and the second queue 134, respectively, based on the timing relationship and the address dependency relationship.

It will be appreciated that the memory copy instruction 210 may be received before the read-write instruction 220, and that the memory copy instruction 210 may also be received after the read-write instruction 220. Further, the memory copy instruction 210 includes a source address 420 and a destination address 430. The read/write command 220 includes a read command or a write command, and the address of the read/write command 220 will be related to the read address of the read command and the write address of the write command. Thus, the address dependencies of the memory copy instruction 210 and the read-write instruction 220 should be discussed in cases.

In accordance with an exemplary implementation of the present application, consider first the case where a memory copy instruction 210 is received before a read-write instruction 220. Assume that the timing relationship indicates that the memory copy instruction 210 is received before the read/write instruction 220 and the address dependency indicates that the destination address 430 of the memory copy instruction 210 and the address of the read/write instruction 220 refer to the same page. In this case, the read/write instruction 220 needs to access the data at the destination address 430 that has been modified by the memory copy instruction 210, and thus, whether the read/write instruction 220 is a read instruction or a write instruction, the memory copy instruction 210 needs to be executed first, and then the read/write instruction 220 needs to be executed. Otherwise, if the read-write command 220 is executed first, the data cannot be accessed in the original order, and further operation errors are caused. At this point, the read and write instructions 220 may be stored to the second queue 134 after the memory copy instruction 210 has been executed.

In this case, the memory copy instruction 210 may be executed immediately, and the read and write instructions 220 may be stored to the second queue 134 after the memory copy instruction 210 has been executed. With the exemplary implementation of the present application, by preferentially executing the memory copy instruction 210, the address dependency of the memory copy instruction 210 and the read/write instruction 220 can be eliminated. In this manner, after the memory copy instruction 210 has been successfully executed, the address dependencies of the memory copy instruction 210 and the read and write instruction 220 no longer exist. It is thus ensured that the correct results are obtained when executing the respective instructions in the two queues in parallel.

According to an exemplary implementation of the present application, it is assumed that the timing relationship indicates that the memory copy instruction 210 is received before the read-write instruction 220, and the address dependency relationship indicates that the source address 420 of the memory copy instruction 210 and the address of the read-write instruction 220 refer to the same page. At this time, it is necessary to distinguish whether the read/write command 220 is a read command or a write command. If the read-write instruction 220 is a read instruction, the source address 420 of the memory copy instruction 210 and the read instruction refer to the same page. At this time, the two read operations involving the same page do not cause data collision, and thus the memory copy instruction 210 and the read/write instruction 220 can be stored in the first queue 132 and the second queue 134, respectively. If the read-write instruction 220 is a write instruction, the source address 420 of the memory copy instruction 210 and the write instruction refer to the same page. At this time, one read operation and one write operation involve the same page, and data collision will result. Thus, the memory copy instruction 210 needs to be executed first (i.e., the data to be copied is read from the source address 420 and the read data is stored to the destination address 430), and after the memory copy instruction 210 has been executed, the write instruction is stored to the second queue 134.

In this case, the memory copy instruction 210 may be executed immediately, and the write instruction stored to the second queue 134 after the memory copy instruction 210 has been executed. With the exemplary implementation of the present application, by preferentially executing the memory copy instruction 210, the address dependency of the memory copy instruction 210 and the write instruction can be eliminated. In this manner, after the memory copy instruction 210 has been successfully executed, the address dependencies of the memory copy instruction 210 and the write instruction no longer exist. It is thus ensured that the correct results are obtained when executing the respective instructions in the two queues in parallel.

The case where the memory copy instruction 210 is received before the read-write instruction 220 has been described above, and hereinafter, the case where the memory copy instruction 210 is received after the read-write instruction 220 will be described. It is assumed that the timing relationship indicates that the memory copy instruction 210 is received after the read/write instruction 220 and the address dependency relationship indicates that the destination address 430 of the memory copy instruction 210 and the address of the read/write instruction 220 refer to the same page. If the read/write command 220 is a read command, the destination address 430 of the memory copy command 210 and the address of the read command refer to the same page, and the read command needs to be executed first. Otherwise, the memory copy instruction 210 overwrites the data at the destination address 430 and results in the read instruction not accessing the correct data. For another example, assuming that the read/write instruction 220 is a write instruction, the destination address 430 of the memory copy instruction 210 and the address of the write instruction refer to the same page, and the write instruction also needs to be executed first. Otherwise, the memory copy instruction 210 overwrites the data at the destination address 430 and results in two write operations that fail to obtain the correct result. It can be seen that if the memory copy instruction 210 is received after the read/write instruction 220 and the address dependency indicates that the destination address 430 of the memory copy instruction 210 and the address of the read/write instruction 220 relate to the same page, no matter whether the read/write instruction 220 is a read instruction or a write instruction, the read/write instruction 220 needs to be executed first, so as to eliminate the address dependency. Further, after the read-write instructions 220 have been executed, the memory copy instructions 210 are stored to the first queue 132.

In this case, the read/write instruction 220 may be executed immediately, and the memory copy instruction 210 may be stored to the first queue 132 after the read/write instruction 220 has been executed. With the exemplary implementation of the present application, by preferentially executing the read-write instruction 220, the address dependency of the memory copy instruction 210 and the read-write instruction 220 can be eliminated. In this manner, after the read-write instruction 220 has been successfully executed, the address dependencies of the memory copy instruction 210 and the read-write instruction 220 no longer exist. It is thus ensured that the correct results are obtained when executing the respective instructions in the two queues in parallel.

According to the exemplary implementation manner of the present application, if the timing relationship indicates that the memory copy instruction 210 is received after the read/write instruction 220, and the address dependency relationship indicates that the source address 420 of the memory copy instruction 210 and the address of the read/write instruction 220 relate to the same page, it is necessary to distinguish whether the read/write instruction 220 is a read instruction or a write instruction. If the read-write instruction 220 is a read instruction, the source address 420 of the memory copy instruction 210 and the address of the read instruction refer to the same page. At this time, the two read operations involving the same page do not cause data collision, and thus the memory copy instruction 210 and the read instruction can be stored in the first queue 132 and the second queue 134, respectively. If the read-write instruction 220 is a write instruction, the source address 420 of the memory copy instruction 210 and the address of the write instruction refer to the same page. At this time, one read operation and one write operation access the same page and will result in data conflicts. Thus, a write instruction needs to be executed first (i.e., data is written to the address of the write instruction first), and the memory copy instruction 210 is stored to the first queue 132 after the write instruction has been executed.

In this case, the read and write instructions 220 may be executed immediately, and the memory copy instruction 210 may be stored to the first queue 132 after the write instruction has been executed. With the exemplary implementation of the present application, by preferentially executing the write instruction, the address dependency of the memory copy instruction 210 and the write instruction can be eliminated. In this manner, after the write instruction has been successfully executed, the address dependencies of the memory copy instruction 210 and the write instruction no longer exist. It is thus ensured that the correct results are obtained when executing the respective instructions in the two queues in parallel.

How the memory copy instruction 210 and the read-write instruction 220 are stored to the first queue 132 and the second queue 134, respectively, based on the address dependency and the timing relationship has been described above with reference to fig. 6. It will be appreciated that the method 600 illustrated in FIG. 6 may be performed repeatedly and may constantly process each of a plurality of instructions from the processing unit.

It will be appreciated that if the copy length of the memory copy instruction 210 is too long, the likelihood of an address dependency between the memory copy instruction 210 and the read/write instruction 220 will increase. Further, the length of time to execute the memory copy instruction 210 also increases. Specifically, if an address dependency is detected between the memory copy instruction 210 and the read/write instruction 220, one of the two instructions needs to be executed first according to the method described above, and the other of the two instructions can be stored in the corresponding queue after the execution is successful. This results in a reduction in the parallelism of instruction execution, which in turn reduces the overall performance of the computing system. To reduce the likelihood of an address dependency between the memory copy instruction 210 and the read/write instruction 220, the memory copy instruction 210 that involves a large copy length may be divided into a plurality of sub-instructions.

According to an example implementation of the present application, if the copy length 440 of the memory copy instruction 210 exceeds a predetermined threshold, the memory copy instruction 210 may be divided into a plurality of sub-instructions. Further, the sub-instructions may be processed as normal memory copy instructions in the manner described above to store the sub-instructions to the first queue 132. The value of the predetermined threshold may be set according to the specific hardware configuration of the computing device and/or the software environment of the computing device. For example, the predetermined value may be set to 2MB (or other value). Assuming that the copy length of the received memory copy instruction 210 reaches 4MB, the memory copy instruction 210 may be divided into two sub-instructions with copy lengths of 2 MB. Each of the two sub-instructions may then be processed in sequence. Specifically, the address dependency of each sub-instruction with the read-write instruction 220 of the plurality of instructions may be determined separately and each sub-instruction stored to the first queue 132.

With the exemplary implementation of the present application, by dividing the memory copy instruction 210 spanning the increased copy length into a plurality of sub-instructions, the possibility of frequently detecting the address dependency between the memory copy instruction 210 and the read-write instruction 220 can be reduced. In this way, the degree of parallel processing of executing the memory copy instruction 210 and the read/write instruction 220 may be increased, thereby increasing the overall performance of the computing system.

According to an exemplary implementation of the present application, the number of sub-instructions resulting from the division may be recorded. For example, a counter may be set to indicate the number of sub-instructions that have been processed. In this way, it may be ensured that each sub-instruction can be executed correctly. Other validation mechanisms may also be used to ensure that each sub-instruction is executed, according to exemplary implementations of the present application. For example, a signature of the data to be copied at the source address 420 of the memory copy instruction 210 may be computed, and after a number of sub-instructions have been executed, a signature of the data that has been copied at the destination address 430 of the memory copy instruction 210 may be computed. If the two signatures agree, the original memory copy instruction 210 may be verified as being successfully executed.

It has been described above how to store the received plurality of instructions to the first queue 132 and the second queue 134, respectively. In the following, it will be described back to fig. 3 how the instructions in the first queue 132 and the second queue 134 are executed in parallel. At block 330 of FIG. 3, the instructions stored in the first queue 132 and the second queue 134 are executed in parallel. It will be understood that parallel execution herein means that the two queues can be independent of each other and execute in parallel. When the instructions in each queue are executed separately, they should be executed in the order in which they were stored to the queue.

Specifically, in the first queue 132, the at least one memory copy instruction may be executed in the order of the at least one memory copy instruction in the first queue 132. Assuming that the first queue 132 includes a memory copy instruction a and a memory copy instruction B, the memory copy instruction a should be executed first, and after the memory copy instruction a has been successfully executed, the execution of the memory copy instruction B is started. In the second queue 134, at least one read and write instruction may be executed in an order of at least one read and write instruction in the second queue 134. Assuming that the second queue 134 includes a read-write instruction C and a read-write instruction D, the read-write instruction C should be executed first, and after the read-write instruction C has been executed successfully, the read-write instruction D is started to be executed. It will be appreciated that since the first queue 132 and the second queue 134 are independent of each other, instructions in both queues can be executed in parallel and have no time dependencies. It is sufficient to ensure that the memory copy instruction B is executed after the memory copy instruction a and the read-write instruction D is executed after the read-write instruction C.

With the exemplary implementation of the present application, multiple instructions are stored to corresponding queues, which may avoid the situation where the read-write instruction 220 is blocked by the memory copy instruction 210. Further, since the instructions in the first queue 132 and the second queue 134 can be executed in parallel, the execution efficiency of the instructions accessing the memory can be improved. In this way, the degree of parallel processing of multiple instructions from processing unit 140 may be increased, thereby increasing the overall performance of the computing device.

In the above, the various steps of the method 300 performed by the data processing apparatus 110 shown in fig. 1 have been described. Further alternative implementations of the data processing device 110 will be further described below with reference to fig. 7. Fig. 7 is a block diagram 700 of a hardware structure of a data processing apparatus of an exemplary implementation of the present application. As shown in fig. 7, the data processing apparatus 110 may comprise a plurality of hardware units in order to implement the method 300 described above. It will be understood that a plurality of hardware units may be used herein in place of the control unit 120 in the data processing apparatus 110, and that the various steps of the method 300 are performed by a plurality of hardware units together.

In particular, processor interface unit 710 may be used to connect to processing unit 140 and be used to receive multiple instructions from processing unit 140 and return the results of executing the instructions to processing unit 140. Enqueue unit 720 may be configured to store the received plurality of instructions to corresponding queues, respectively. Enqueue unit 720 may include an address unit 722 and a timing unit 724, where address unit 720 may be used to determine address dependencies among a plurality of instructions received, and timing unit 724 may be used to determine timing relationships among a plurality of instructions received. The buffer unit 130 may utilize the first queue 132 and the second queue 134 to store the received multiple instructions, specifically, the first queue 132 may be used to store the memory copy instruction 210 and the second queue 134 may be used to store the read/write instruction 220. Instruction dispatch unit 740 may be used to execute instructions in first queue 132 and second queue 134 in parallel. Memory command generation unit 750 may be used to convert instructions in the queue (including memory copy instructions and read and write instructions) to memory understandable memory commands. The memory interface unit 770 is used to connect to the memory 142 in the computing device. The temporary storage unit 760 may be used to temporarily store execution results from the memory 142. For example, for a memory copy instruction, the execution result may include a response (e.g., success/failure) to execute the instruction; for a read instruction, the results of the execution may include data read from memory; for a write instruction, the execution result may include a response (e.g., success/failure) to execute the instruction.

Hereinafter, a process of processing a plurality of instructions will be described with reference to the hardware structure of fig. 7. Processor interface unit 710 may receive a plurality of instructions from processing unit 140, as indicated by arrow 712. In turn, as illustrated by arrow 780, enqueue unit 720 may receive a plurality of instructions and address unit 722 may determine address dependencies between the plurality of instructions. If it is detected that the plurality of instructions includes a direct read/write instruction, the direct read/write instruction may be directly sent to instruction scheduling unit 740 for immediate execution, as indicated by arrow 782. Timing unit 724 may determine a timing relationship between multiple instructions. Then, as indicated by arrow 784, the plurality of instructions may be stored to the first queue 132 and the second queue 134, respectively, according to the type of the plurality of instructions based on the determined address dependency and timing relationship. Instruction dispatch unit 740 may dispatch instructions in first queue 132 and second queue 134 in parallel, as indicated by arrow 786.

As indicated by arrow 788, instruction dispatch unit 740 may send the instructions in first queue 132 and second queue 134 to memory command generation unit 750 for generation of the corresponding memory commands. Further, the generated memory command may be sent to the memory interface unit 770 via arrow 790 for further submission to the memory 142 as shown via arrow 792. The results of executing the memory command may be received from memory 142. The results may be held in a temporary storage location 760, as indicated by arrow 794. Instruction dispatch unit 740 may retrieve the results from temporary storage unit 760, as indicated by arrow 796, and send to processor interface unit 710 via arrow 798. Finally, processor interface unit 710 may return the results to processing unit 140.

According to an exemplary implementation manner of the present application, each unit described above may be implemented in a hardware circuit manner. In this way, multiple instructions from processing unit 140 may be processed at a higher speed. According to an exemplary implementation of the present application, the above-described respective units may be implemented in an editable circuit. Alternatively and/or additionally, the various units described above may also be implemented in firmware and/or a combination of hardware and instructions.

In the foregoing, specific implementations have been described for performing the methods and apparatuses in accordance with the exemplary implementations of the present application. In the following, how an apparatus according to an exemplary implementation of the present application is deployed in a computing device will be described with reference to fig. 8. FIG. 8 is a block diagram 800 of an exemplary implementation of the present application for deploying the data processing apparatus 110 in a computing device. According to an example implementation of the present application, the data processing apparatus 110 may be deployed on-memory side in a computing device. In FIG. 8, processing unit 140 is coupled to memory manager 820. According to an exemplary implementation of the present application, the data processing apparatus 110 according to an exemplary implementation of the present application may be deployed in the memory manager 820. Data processing apparatus 110 may store a plurality of instructions from processing unit 140 into first queue 132 and second queue 134, respectively, and execute the instructions in both queues in parallel. With the exemplary implementation of the present application, the data processing apparatus 110 is closer to the memory, thereby helping to increase the speed of instruction execution.

According to an example implementation of the present application, a data processing apparatus may be deployed in one or more memory managers in a computing device. For example, the data processing apparatus 110 may be deployed in the memory manager 820, and the memory manager 922 may remain intact and not deploy the data processing apparatus 110.

According to an exemplary implementation of the present application, the data processing apparatus 110 may be deployed on a processing unit in a computing device. As shown in fig. 8, processing unit 140 may include a memory controller 810, memory controller 810 for controlling the operation of memory manager 820. Data processing apparatus 110 may be deployed in memory controller 810 and data processing apparatus 110 in memory manager 820 as shown in fig. 8 may be moved to memory controller 810. With the exemplary implementation of the present application, the data processing apparatus 110 is closer to the processing unit 140, thus facilitating reuse of various resources in the processing unit 140.

The data processing apparatus is arranged to perform the functions as implemented by the data processing apparatus in figures 1 to 7. It should be understood that the processing unit 140, the memory manager 820 and the memory 144 are communicated via a bus (not shown), and the communication may be realized via other means such as wireless transmission.

For illustrative purposes only, exemplary implementations according to the present application are listed below.

According to an exemplary implementation of the present application, there is provided a data processing apparatus including: a control unit and a cache unit coupled to the control unit; the cache unit is used for storing a plurality of instructions received by the control unit by utilizing a first queue and a second queue, the plurality of instructions comprise memory copy instructions and read-write instructions, the first queue is used for storing the memory copy instructions, and the second queue is used for storing the read-write instructions; a control unit for: receiving a plurality of instructions; storing a plurality of instructions into a first queue and a second queue according to a preset rule and the types of the plurality of instructions; and executing the instructions stored in the first queue and the second queue in parallel.

According to an exemplary implementation of the present application, the predetermined rule includes an address dependency relationship between the plurality of instructions, the address dependency relationship being determined according to addresses carried in the plurality of instructions.

According to an exemplary implementation of the application, the address dependency is determined according to at least any one of the following: judging whether a memory copy instruction and a read-write instruction in the plurality of instructions access the same page or not, and if the memory copy instruction and the read-write instruction access the same page, determining that the memory copy instruction and the read-write instruction have an address dependency relationship; and if the memory copy instruction and the read-write instruction do not access the same page, determining that the memory copy instruction and the read-write instruction do not have an address dependency relationship.

According to an exemplary implementation of the present application, the memory copy instruction includes: a source address, a destination address, and a copy length, the memory copy instruction indicating that data at the source address is copied to the destination address based on the memory copy length.

According to an exemplary implementation of the application, the control unit is further configured to: detecting a direct read-write instruction in the read-write instructions in the plurality of instructions based on the address dependency relationship, wherein the direct read-write instruction does not have the address dependency relationship with any other memory copy instruction in the plurality of instructions; and executing the direct read and write instructions.

According to an exemplary implementation of the present application, storing the plurality of instructions to the first queue and the second queue, respectively, comprises: determining whether the memory copy instruction and a read-write instruction in the plurality of instructions have an address dependency relationship or not aiming at the memory copy instruction in the plurality of instructions; and if the memory copy instruction is determined not to have an address dependency relationship with any read-write instruction in the plurality of instructions, storing the memory copy instruction to a first queue.

According to an exemplary implementation of the present application, storing the plurality of instructions to the first queue and the second queue, respectively, comprises: if the memory copy instruction and the read-write instruction are determined to have the address dependency relationship, determining the time sequence relationship of the memory copy instruction and the received read-write instruction; and respectively storing the memory copy instruction and the read-write instruction to the first queue and the second queue based on the time sequence relation and the address dependency relation.

According to an exemplary implementation of the application, the control unit is further configured to: if the copying length of the memory copying instruction is determined to exceed a preset threshold value, dividing the memory copying instruction into a plurality of sub-instructions; and storing the plurality of sub-instructions to a first queue.

According to an exemplary implementation of the present application, executing instructions stored in the first queue and the second queue in parallel comprises: executing at least one memory copy instruction according to the sequence of the at least one memory copy instruction in the first queue; and executing the at least one read-write instruction in the order of the at least one read-write instruction in the second queue in parallel with executing the at least one memory copy instruction.

According to an exemplary implementation of the present application, there is provided a data processing method, including: receiving a plurality of instructions, wherein the plurality of instructions comprise a memory copy instruction and a read-write instruction; according to a preset rule and the types of a plurality of instructions, the plurality of instructions are respectively stored in a first queue and a second queue, the first queue is used for storing memory copy instructions, and the second queue is used for storing read-write instructions; and executing the instructions stored in the first queue and the second queue in parallel.

According to an exemplary implementation of the application, the address dependencies are determined according to at least any one of: judging whether a memory copy instruction and a read-write instruction in the plurality of instructions access the same page or not, and if the memory copy instruction and the read-write instruction access the same page, determining that the memory copy instruction and the read-write instruction have an address dependency relationship; and if the memory copy instruction and the read-write instruction do not access the same page, determining that the memory copy instruction and the read-write instruction do not have an address dependency relationship.

According to an exemplary implementation of the application, the method further comprises: detecting a direct read-write instruction in the read-write instructions in the plurality of instructions based on the address dependency relationship, wherein the direct read-write instruction does not have the address dependency relationship with any other memory copy instruction in the plurality of instructions; and executing the direct read and write instructions.

According to an exemplary implementation of the application, the method further comprises: if the copying length of the memory copying instruction is determined to exceed a preset threshold value, dividing the memory copying instruction into a plurality of sub-instructions; and storing the plurality of sub-instructions to a first queue.

According to an exemplary implementation of the application, the method further comprises: receiving a read-write result obtained by executing the direct read-write instruction; and returning the read-write result.

According to an exemplary implementation manner of the present application, determining that there is no address dependency relationship between the memory copy instruction and the read-write instruction includes: and if the source address and the destination address of the memory copy instruction are different from the address of the read-write instruction, determining that the memory copy instruction and the read-write instruction do not have an address dependency relationship.

According to an exemplary implementation manner of the present application, storing the memory copy instruction and the read-write instruction to the first queue and the second queue, respectively, based on the timing relationship and the address dependency relationship includes: and if the time sequence relation indicates that the memory copy instruction is received before the read-write instruction and the address dependency relation indicates that the destination address of the memory copy instruction and the address of the read-write instruction access the same page, storing the read-write instruction to a second queue after the memory copy instruction is executed.

According to an exemplary implementation manner of the present application, the read-write instruction includes a write instruction, and the storing the memory copy instruction and the read-write instruction to the first queue and the second queue, respectively, based on the timing relationship and the address dependency relationship includes: if the timing relationship indicates that the memory copy instruction is received before the write instruction and the address dependency indicates that the source address of the memory copy instruction and the address of the write instruction access the same page, the write instruction is stored to the second queue after the memory copy instruction has been executed.

According to an exemplary implementation manner of the present application, storing the memory copy instruction and the read-write instruction to the first queue and the second queue, respectively, based on the timing relationship and the address dependency relationship includes: if the timing relationship indicates that the memory copy instruction is received after the read-write instruction and the address dependency relationship indicates that the destination address of the memory copy instruction and the address of the read-write instruction access the same page, after the read-write instruction has been executed, the memory copy instruction is stored to the first queue.

According to an exemplary implementation manner of the present application, the read-write instruction includes a write instruction, and the storing the memory copy instruction and the read-write instruction to the first queue and the second queue, respectively, based on the timing relationship and the address dependency relationship includes: if the timing relationship indicates that the memory copy instruction is received after the write instruction and the address dependency indicates that the source address of the memory copy instruction and the address of the write instruction access the same page, storing the memory copy instruction to the first queue after the write instruction has been executed.

According to an exemplary implementation manner of the present application, determining that the memory copy instruction and the read-write instruction access the same page includes: respectively setting a read counter and a write counter for a page in a memory; and determining that the memory copy instruction and the read-write instruction access the same page according to the read counter and the write counter.

According to an exemplary implementation of the present application, setting the read counter includes: if it is determined that an instruction of the plurality of instructions is to read a page, incrementing a read counter; and decreasing the value of the read counter if it is determined that an instruction of the plurality of instructions has read a page.

According to an exemplary implementation of the present application, setting the write counter includes: if it is determined that an instruction of the plurality of instructions is to be written to the page, incrementing a write counter; and decreasing the value of the write counter if it is determined that an instruction of the plurality of instructions has written to the page.

According to an exemplary implementation manner of the present application, determining that the memory copy instruction and the read-write instruction access the same page according to the read counter and the write counter includes: and determining that the memory copy instruction and the read-write instruction access the same page according to the fact that the numerical value of the write counter is different from the initial value of the write counter.

According to an exemplary implementation of the present application, the memory copy instruction further includes a cache tag to indicate that the data is to be loaded to a cache associated with the memory.

According to an exemplary implementation of the present application, there is provided a data processing apparatus including: the processor interface unit is used for receiving a plurality of instructions, and the plurality of instructions comprise a memory copy instruction and a read-write instruction; the enqueue unit is coupled to the interface unit and used for respectively storing the plurality of instructions into a first queue and a second queue in the cache unit according to a preset rule and the types of the plurality of instructions, wherein the first queue is used for storing the memory copy instruction, and the second queue is used for storing the read-write instruction; a cache unit coupled to the enqueue unit to store a plurality of instructions utilizing the first queue and the second queue; and an instruction scheduling unit coupled to the cache unit for executing the instructions stored in the first queue and the second queue in parallel.

According to an exemplary implementation of the present application, there is provided a computing device comprising: a processing unit; and a data processing apparatus according to the above description, wherein the processing unit is adapted to send a plurality of instructions to the data processing apparatus.

According to an exemplary implementation of the present application, there is provided a computing device comprising: a processing unit; and a memory manager for performing the method according to the above description, wherein the processing unit is adapted to send a plurality of instructions to the memory manager.

According to an exemplary implementation of the application, a computer program product is provided, tangibly stored on a non-transitory computer-readable medium and comprising machine executable instructions for performing a method according to the application.

According to an exemplary implementation of the present application, a computer-readable medium is provided. The computer-readable medium has stored thereon machine-executable instructions that, when executed by at least one processor, cause the at least one processor to implement a method according to the present application.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded or executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a Solid State Drive (SSD).

The foregoing is only illustrative of the present application. Those skilled in the art can conceive of changes or substitutions based on the specific embodiments provided in the present application, and all such changes or substitutions are intended to be included within the scope of the present application.

Claims

1. A data processing apparatus, characterized in that the apparatus comprises a control unit and a cache unit coupled to the control unit;

the cache unit is configured to store a plurality of instructions received by the control unit by using a first queue and a second queue, where the plurality of instructions include a memory copy instruction and a read-write instruction, the first queue is configured to store the memory copy instruction, and the second queue is configured to store the read-write instruction;

the control unit is used for receiving the plurality of instructions; storing the plurality of instructions to the first queue and the second queue according to a predetermined rule and the types of the plurality of instructions; and executing instructions stored in the first queue and the second queue in parallel.

2. The apparatus of claim 1, wherein the predetermined rules comprise address dependencies of the plurality of instructions, the address dependencies being determined according to addresses carried in the plurality of instructions.

3. The apparatus of claim 2, wherein the address dependency is determined according to at least any one of:

judging whether a memory copy instruction and a read-write instruction in the plurality of instructions access the same page or not;

if the memory copy instruction and the read-write instruction access the same page, determining that the address dependency relationship exists between the memory copy instruction and the read-write instruction; and

and if the memory copy instruction and the read-write instruction do not access the same page, determining that the address dependency does not exist between the memory copy instruction and the read-write instruction.

4. The apparatus of claim 3, wherein the memory copy instruction comprises: a source address, a destination address, and a copy length, the memory copy instruction indicating that data at the source address is to be copied to the destination address based on the memory copy length.

5. The apparatus of claim 4, wherein the control unit is further configured to:

detecting a direct read-write instruction in read-write instructions in the plurality of instructions based on the address dependency relationship, wherein the direct read-write instruction does not have an address dependency relationship with any memory copy instruction in the plurality of instructions; and

and executing the direct reading and writing instruction.

6. The apparatus of claim 4, wherein storing the plurality of instructions to the first queue and the second queue, respectively, comprises:

determining whether the memory copy instruction and a read-write instruction in the plurality of instructions have an address dependency relationship or not aiming at the memory copy instruction in the plurality of instructions; and

and if the memory copy instruction is determined not to have an address dependency relationship with any read-write instruction in the plurality of instructions, storing the memory copy instruction to the first queue.

7. The apparatus of claim 4, wherein the predetermined rules further include a timing relationship that the memory copy instruction and the read/write instruction are received, and wherein storing the plurality of instructions in the first queue and the second queue, respectively, comprises:

and if the memory copy instruction and the read-write instruction are determined to have the address dependency relationship, respectively storing the memory copy instruction and the read-write instruction to the first queue and the second queue based on the time sequence relationship and the address dependency relationship.

8. The apparatus of claim 3, wherein the control unit is further configured to:

if the copying length of the memory copying instruction is determined to exceed a preset threshold value, dividing the memory copying instruction into a plurality of sub-instructions; and

storing the plurality of sub-instructions to the first queue.

9. The apparatus of claim 1, wherein the executing the instructions stored in the first queue and the second queue in parallel comprises:

executing at least one memory copy instruction in the order of the at least one memory copy instruction in the first queue; and

and executing the at least one read-write instruction according to the sequence of the at least one read-write instruction in the second queue in parallel with executing the at least one memory copy instruction.

10. A method of data processing, the method comprising:

receiving the plurality of instructions, wherein the plurality of instructions comprise a memory copy instruction and a read-write instruction;

according to a preset rule and the types of the instructions, the instructions are respectively stored in a first queue and a second queue, the first queue is used for storing memory copy instructions, and the second queue is used for storing read-write instructions; and

instructions stored in the first queue and the second queue are executed in parallel.

11. The method of claim 10, wherein the predetermined rules include address dependencies of the plurality of instructions, the address dependencies being determined according to addresses carried in the plurality of instructions.

12. The method of claim 11, wherein the address dependency is determined according to at least any one of the following:

13. The method of claim 12, wherein the memory copy instruction comprises: a source address, a destination address, and a copy length, the memory copy instruction indicating that data at the source address is to be copied to the destination address based on the memory copy length.

14. The method of claim 13, further comprising:

and executing the direct reading and writing instruction.

15. The method of claim 13, wherein storing the plurality of instructions to the first queue and the second queue, respectively, comprises:

16. The method of claim 13, wherein the predetermined rules further include a timing relationship that the memory copy instruction and the read/write instruction are received, and wherein storing the plurality of instructions in the first queue and the second queue, respectively, comprises:

17. The method of claim 13, further comprising:

storing the plurality of sub-instructions to the first queue.

18. The method of claim 10, wherein the executing the instructions stored in the first queue and the second queue in parallel comprises:

19. A memory manager, characterized in that it comprises data processing means for implementing the functionality of the data processing means according to any one of claims 1 to 9 or the operating steps of the method according to any one of claims 10 to 18.

20. A computing device comprising a processing unit and a memory manager, the memory manager comprising data processing means for implementing the functions of the data processing means according to any of claims 1 to 9 or the operational steps of the method according to any of claims 10 to 18.