WO2018166337A1 - Procédé et dispositif de traitement de données - Google Patents

Procédé et dispositif de traitement de données Download PDF

Info

Publication number
WO2018166337A1
WO2018166337A1 PCT/CN2018/077026 CN2018077026W WO2018166337A1 WO 2018166337 A1 WO2018166337 A1 WO 2018166337A1 CN 2018077026 W CN2018077026 W CN 2018077026W WO 2018166337 A1 WO2018166337 A1 WO 2018166337A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
cache
address
external storage
module
Prior art date
Application number
PCT/CN2018/077026
Other languages
English (en)
Chinese (zh)
Inventor
徐志通
孙璐
熊礼文
崔鲁平
陈俊锐
余谓为
李又麟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018166337A1 publication Critical patent/WO2018166337A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data processing method and apparatus.
  • the hardware system of a computer device needs to support both the big end data format and the little end data format.
  • the processor instruction pipeline in the computer device often only supports one data format, for example, only supports the little end data format or only supports the big end data format.
  • the data read by the computer device from the cache is sent to the processor instruction pipeline.
  • the data format in the external storage operation is consistent with the data format supported by the processor instruction pipeline (Endianness match)
  • the data format in the external storage work is inconsistent with the data format supported by the processor instruction pipeline (Endianness Mismatch)
  • the data read from the Cache needs to be right-shifted by Data Alignment and size.
  • Load Hit Path such as Endian Conversion and Sign Extension, can be sent to the processor instruction pipeline for subsequent processing.
  • the embodiment of the present application provides a data processing method and apparatus, so as to at least reduce a load hit delay when data formats of different sizes are inconsistent.
  • the embodiment of the present application provides the following technical solutions:
  • a data processing method comprising: acquiring a read instruction sent by a processor instruction pipeline, where the read instruction includes address information of an first data to be read in an external storage of a cache cache, where The read/write width of the Cache is 2P bytes, and the number of bytes of the first data is K ⁇ P, and both K and P are positive integers; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline Determining address information of the second data in the Cache according to the address information of the first data in the external storage, where the second data is data corresponding to the first data in the third data, the third The data is data obtained by converting the cache block cacheline containing the first data into a size-end format, and the size of the cacheline is ⁇ 2P; according to the address information of the second data in the Cache, 2P is read from the cache.
  • indicates negation
  • n log 2 P
  • Address[n: 0] indicates the value of the low (n+1)-bit address of the first address of the first data in the external storage of the Cache;
  • the second data is sent to the processor instruction pipeline.
  • the data read from the Cache is the data converted by the format of the large and small end, and thus is in the Cache.
  • the data is read, there is no need to perform the format conversion of the large and small end, thereby avoiding the load hit delay caused by the data conversion processing logic of the large and small end data introduced by the data format inconsistent after the data is read from the Cache in the prior art.
  • the added problem reduces the load hit delay when the data format of the big and small ends is inconsistent.
  • the method before reading the 2P bytes of the fourth data from the Cache according to the address information in the Cache according to the second data, the method further includes: reading the inclusion from the external storage of the Cache a cacheline of the first data; converting the cacheline containing the first data to a data format of the size end, to obtain the third data; and writing the third data into the cache.
  • the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache, and the data written in the Cache is converted by the size end format. data.
  • the size-to-size format conversion is not required, thereby avoiding the data processing of the big-end data format introduced by the inconsistent data format of the large and small ends after reading data from the Cache in the prior art.
  • the resulting increase in load hit latency increases the load hit latency when the data format on the big and small ends is inconsistent.
  • the method further includes: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, according to the first data
  • the method before reading the 6P bytes of the sixth data from the Cache according to the address information in the external storage according to the first data, the method further includes: reading from the external storage of the Cache a cacheline containing the first data; writing the cacheline containing the first data to the cache. Based on the scheme, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache.
  • the method further includes: acquiring a write instruction sent by the processor instruction pipeline, where the write instruction includes an eighth data to be written and a byte number T of the eighth data, where T ⁇ P, T is an integer; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the eighth data is rotated to the left by the third byte number to obtain the ninth data of 2P bytes.
  • Index 3 ⁇ (Address [n: 0] + T - 1), Index 3 represents the third byte number; the ninth data is written into the Cache.
  • the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline.
  • the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
  • an embodiment of the present application provides a data processing apparatus, which has a function of implementing behavior of a data processing apparatus in the foregoing method embodiment.
  • This function can be implemented in hardware or in hardware by executing the corresponding software.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • an embodiment of the present application provides a data processing apparatus, including: a processor, a memory, a bus, and a communication interface; the memory is configured to store a computer execution instruction, and the processor is connected to the memory through the bus, when the data is The processor executes the computer-executable instructions stored in the memory to cause the data processing apparatus to perform the data processing method of any of the first aspects described above.
  • an embodiment of the present application provides a computer readable storage medium, configured to store computer software instructions used by the data processing apparatus, when executed on a computer, to enable the computer to perform any of the foregoing first aspects.
  • a data processing method configured to store computer software instructions used by the data processing apparatus, when executed on a computer, to enable the computer to perform any of the foregoing first aspects.
  • an embodiment of the present application provides a computer program product comprising instructions, which when executed on a computer, enable the computer to perform the data processing method of any of the above first aspects.
  • 1 is a logic block diagram of data processing when the data formats of the big and small ends are inconsistent in the prior art
  • FIG. 2 is a schematic structural diagram of a hierarchical storage multi-core system to which the embodiment of the present application is applied;
  • FIG. 3 is a schematic structural diagram of hardware of a data processing apparatus according to an embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart 1 of a data processing method according to an embodiment of the present application.
  • FIG. 5 is a logic block diagram of data processing when the data formats of the big and small ends are inconsistent according to an embodiment of the present disclosure
  • FIG. 6 is a second schematic flowchart of a data processing method according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram 1 of an example of a data processing method according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram 2 of an example of a data processing method according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram 3 of an example of a data processing method according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram 4 of an example of a data processing method according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram 1 of a data processing apparatus according to an embodiment of the present disclosure.
  • FIG. 12 is a second schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure.
  • FIG. 13 is a schematic structural diagram 3 of a data processing apparatus according to an embodiment of the present disclosure.
  • FIG. 2 is a schematic structural diagram of a hierarchical storage multi-core system to which the embodiment of the present application is applied.
  • the multi-core system 100 includes a bus 101, a multi-core processor 102 connected to the bus 101, and a memory 103 connected to the bus 101.
  • the memory 103 may be a random access memory (English: Random Access Memory, abbreviation: RAM), or a dynamic random access memory (English: Dynamic Random Access Memory, DRAM), etc. No specific limitation.
  • the bus 101 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For convenience of representation, only one thick line is shown in Figure 2, but it does not mean that there is only one bus or one type of bus.
  • the multi-core processor 102 includes a plurality of processor cores, such as a processor core 102a, a processor core 102b, ..., a processor core 102c, which may be a central processing unit (English: Central Processing Unit, abbreviation: CPU)
  • the core may be a graphics processing unit (English: Graphic Processing Unit, GPU) core, which is not specifically limited in this embodiment.
  • these processor cores are mainly used to perform calculations, and each processor core has its own level 1 cache (English: Level 1 Cache, abbreviation: L1C) and Level 2 Cache (abbreviation: L2C);
  • the processor core shares a last level cache (English: Last Level Cache, abbreviation: LLC); multiple multi-core processors share a single memory.
  • a processor core When a processor core receives an instruction to read data, first check whether the address exists in L1C. If it exists, the processor core directly reads the data from L1C. If the address does not exist, then the The processor core will continue to look into the L2C, and so on.
  • the embodiment of the present application is also applicable to a single-core system or a system including a Cache having a similar hierarchical storage structure, which is not specifically limited in this embodiment of the present application.
  • FIG. 3 is a schematic diagram showing the hardware structure of a data processing apparatus 30 according to an embodiment of the present application.
  • the data processing device 30 includes a processor 301, a memory 302, a communication interface 304, and a bus 303.
  • the processor 301, the communication interface 304, and the memory 302 are connected to one another via a bus 303.
  • the processor 301 is the control center of the data processing device 30, connecting the various portions of the entire data processing device 30 via the bus 303, by running or executing software programs and/or modules stored in the memory 302, and recalling stored in the memory 302.
  • the data, various functions of the data processing device 30 and processing data are executed to thereby perform overall monitoring of the data processing device 30.
  • the processor 301 can be any one of the processor cores in FIG. 2 above.
  • the memory 302 can be used to store software programs and modules, and the processor 301 executes various functional applications and data processing of the data processing device 30 by running software programs and modules stored in the memory 302.
  • the memory 302 mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application 2 required for at least one function, and the like; the storage data area can store data created according to the use of the data processing apparatus 30, and the like.
  • memory 302 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the memory 302 can be the memory in FIG. 2 above.
  • the bus 303 can be a PCI bus or an EISA bus or the like.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 3, but it does not mean that there is only one bus or one type of bus.
  • bus 303 may be the bus of FIG. 2 described above.
  • Communication interface 304 is used for communication of data processing device 30 with external devices.
  • the data processing device 30 may also include a radio frequency (English: Radio Frequency, abbreviated as RF) circuit, an audio circuit, a communication interface, and/or a plurality of sensors, which are not specifically limited in this embodiment of the present application.
  • RF Radio Frequency
  • the data processing device 30 provided by the embodiment of the present application may be used when the processor 301 is the processor core of the foregoing FIG. 2, the memory 302 is the memory in FIG. 2, and the bus 303 is the bus in FIG. It is the multi-core system in FIG. 2 above, and the embodiment of the present application does not specifically limit the situation.
  • a schematic flowchart of a data processing method includes the following steps:
  • the data processing device acquires a read instruction sent by the processor instruction pipeline, where the read instruction includes address information of the first data to be read in an external storage of the Cache.
  • the read/write width of the Cache is 2P bytes, and the number of bytes of the first data is K ⁇ P, and both K and P are positive integers.
  • the read/write width of the Cache is 2P bytes, which means that when the data is read from the Cache, 2P bytes of data are read each time; when the data is written into the Cache, 2P is written each time. Byte of data.
  • the Cache in the step S401 may be the L1C, and the external storage of the Cache may be the L2C, the LLC or the memory, etc., in the embodiment of the present application, the data processing device in the embodiment of the present application is the multi-core system in FIG. No specific limitation.
  • the data processing apparatus determines, according to the address information of the first data in the external storage, the address information of the second data in the Cache, where The data is the data corresponding to the first data in the third data, and the third data is the data obtained by converting the cache line containing the first data into a large and small format, and the size of the cacheline is ⁇ 2P.
  • the data processing device reads the second data of 2 Pbytes from the Cache according to the address information of the second data in the Cache, where the fourth data includes the second data.
  • the data processing device shifts the fourth data right by the first byte to obtain the fifth data, where the fifth data includes the second data, where the second data is the lower K address of the 2P byte address corresponding to the fifth data.
  • Index 1 ⁇ (Address [n: 0] + K - 1)
  • Index1 represents the first byte
  • represents inversion
  • n log 2 P
  • Address [n: 0] represents the first data in the Cache The value of the low (n+1)-bit address in the first address in the external storage.
  • the data is shifted to the right to achieve data right alignment.
  • the reason why the data is right-aligned is that when the processor instruction pipeline reads the data, the rightmost byte is the byte corresponding to the first address of the read instruction, so the data needs to be right before being sent to the processor instruction pipeline. Alignment, a unified description here, the details are not described below.
  • the data processing device sends the second data to the processor instruction pipeline.
  • the data read from the Cache is data converted by the size end format, and thus is read from the Cache. After the data is fetched, there is no need to perform a large-scale format conversion.
  • the data processing logic block diagram when the data format of the big and small ends provided by the embodiment of the present application is inconsistent.
  • the format conversion of the big end is located in the write channel of the cache, that is, the data is in the data.
  • the size conversion is performed in the process of writing to the Cache from the external storage.
  • the Cache can be directly accessed from the Cache. The data after the size conversion is read, and the data is right aligned to obtain the required data.
  • symbol bit extension in FIG. 5 is an optional operation in the data processing method provided by the present application.
  • the specific implementation may refer to the existing processing manner, which is not specifically limited in this embodiment of the present application.
  • the first information is determined according to the address information in the external storage.
  • the address information of the second data in the Cache wherein the second data is the data corresponding to the first data in the third data, and the three data is the data obtained by converting the cacheline containing the first data into a format of the size end; Reading the second data of the 2P bytes of the second data from the Cache according to the address information of the second data in the Cache; then shifting the fourth data right to the first byte to obtain the fifth data including the second data Data, and the second data is sent to the processor instruction pipeline, which is the data on the lower K-bit address of the 2P byte address corresponding to the fifth data.
  • the data read from the Cache is the data converted by the format of the large and small end, and thus is in the Cache.
  • the data is read, there is no need to perform the format conversion of the large and small end, thereby avoiding the load hit delay caused by the data conversion processing logic of the large and small end data introduced by the data format inconsistent after the data is read from the Cache in the prior art.
  • the added problem reduces the load hit delay when the data format of the big and small ends is inconsistent.
  • the data processing device may further include:
  • the data processing device reads the cacheline containing the first data from the external storage of the Cache; converts the cacheline containing the first data into the data format of the large and small end to obtain the third data; and writes the third data into the Cache.
  • the size of the cacheline is ⁇ 2P
  • the size of the third data is the size of the cacheline
  • it may be written in a plurality of write processes. For example, if the size of the cacheline is 4P, it is written in two write processes, which is not specifically limited in this embodiment of the present application.
  • the data in the external storage can be written into the Cache, and the data written in the Cache is converted by the size end format. data. Furthermore, after the data is read from the Cache, the size-to-size format conversion is not required, thereby avoiding the data processing of the big-end data format introduced by the inconsistent data format of the large and small ends after reading data from the Cache in the prior art. The resulting increase in load hit latency increases the load hit latency when the data format on the big and small ends is inconsistent.
  • the method may further include the following steps:
  • the data processing device reads the 6P bytes of the sixth data from the Cache according to the address information of the first data in the external storage, where The sixth data includes the first data.
  • the data processing device shifts the sixth data right to the second byte to obtain the seventh data.
  • the seventh data includes the first data, where the first data is the lower K address of the 2P byte address corresponding to the seventh data. The data on it.
  • Index2 Address[n:0]
  • Index2 represents the second byte
  • n log 2 P
  • Address[n:0] indicates that the first data is low in the first address in the external storage of the Cache (n+1) The value of the bit address.
  • the data processing device sends the first data to the processor instruction pipeline.
  • the data read from the Cache is the data written to the Cache from the external storage.
  • the data in the Cache can be sent to the processor instruction pipeline when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
  • the method may further include:
  • the data processing device reads the cacheline containing the first data from the external storage of the Cache; and writes the cacheline containing the first data into the Cache.
  • the read/write width of the Cache is 2 Pbytes and the size of the cacheline is ⁇ 2 P
  • the cacheline containing the first data is written into the Cache, it may be written multiple times.
  • the process is written, for example, if the size of the cacheline is 4P, it is written in two write processes, which is not specifically limited in this embodiment of the present application.
  • the data in the external storage can be written into the Cache.
  • the data processing method provided by the embodiment of the present application may further include: the data processing device acquires a write instruction sent by the processor instruction pipeline, where the write instruction includes the eighth data to be written and the number of bytes of the eighth data T.
  • T ⁇ P T is an integer
  • the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline.
  • the method may further include: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, shifting the eighth data to the left
  • the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
  • the processor instruction pipeline only supports the little endian data format
  • the external storage supports the big end data format
  • the size of the cacheline is 16 bytes, for example
  • the data processing device inverts the 16-byte cacheline in the external storage in bytes and stores it in the Cache, which is equivalent to storing the 16-byte big endian data in the cache in a little endian format.
  • the processor instruction pipeline initiates a read instruction, the little endian data can be read directly from the Cache.
  • the cacheline is reversed when the data is written to the Cache, when the data is right-aligned, the index that is rotated rightward needs to be inversely compensated.
  • the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data at the 0x7, 0x8, 0x9, and 0xA addresses in the external storage, because of the data in the external storage.
  • B0 is written to the 0x5 address in the Cache
  • B1 is written to the 0x6 address in the Cache
  • B2 is written to the 0x7 address in the Cache
  • B3 is written to the 0x8 address in the Cache.
  • the Cache When the processor instruction pipeline needs to read data at addresses 0x7, 0x8, 0x9, and 0xA, the Cache outputs the cacheline data as shown in mem_data_o.
  • the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index.
  • Index ⁇ (Address[n:0]+K-1).
  • the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
  • the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data on the addresses of 0xE, 0xF, 0x10, and 0x11 in the external storage, and it is required to cross the cacheline. Since the data is stored in the big end mode in the external storage, when the data is written to the Cache, the entire cacheline needs to be inverted in bytes and then written into the Cache, as shown in FIG.
  • B0 is written to the 0x1E address in the Cache
  • B1 is written to the 0x1F address in the Cache
  • B2 is written to the 0x0 address in the Cache
  • B3 is written to the 0x1 address in the Cache.
  • the data is already stored in little form in the Cache.
  • the Cache outputs the cacheline data as shown in mem_data_o.
  • the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index.
  • Index ⁇ (Address[n:0]+K-1).
  • the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
  • the data processing device writes the 16-byte cacheline in the external storage directly into the Cache.
  • the processor instruction pipeline initiates a read instruction
  • the little endian data can be read directly from the Cache.
  • the cacheline is not reversed when the data is written to the Cache, the right alignment of the data is performed, and the index that is rotated rightward does not need to be inversely compensated.
  • the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data at the 0x7, 0x8, 0x9, and 0xA addresses in the external storage, because of the data in the external storage.
  • B0 is written to the 0x7 address in the Cache
  • B1 is written to the 0x8 address in the Cache
  • B2 is written to the 0x9 address in the Cache
  • B3 is written to the 0xA address in the Cache.
  • the data is still stored in little form in the Cache.
  • the Cache When the processor instruction pipeline needs to read data at addresses 0x7, 0x8, 0x9, and 0xA, the Cache outputs the cacheline data as shown in mem_data_o.
  • the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index.
  • Index Address[n:0].
  • the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
  • the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data on the addresses of 0xE, 0xF, 0x10, and 0x11 in the external storage, and cross-cacheline is required. Since the data is stored in little endian mode in the external storage, when the data is written into the Cache, the entire cacheline can be directly written into the Cache, as shown in FIG. B0 is written to the 0xE address in the Cache, B1 is written to the 0xF address in the Cache, B2 is written to the 0x10 address in the Cache, and B3 is written to the 0x11 address in the Cache.
  • the Cache When the processor instruction pipeline needs to read data at addresses 0xE, 0xF, 0x10, and 0x11, the Cache outputs the cacheline data as shown in mem_data_o.
  • the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index.
  • Index Address[n:0].
  • the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
  • the above example uses the processor instruction pipeline to support only the small end data format, and the external storage supports the small end data format as an example for description.
  • the processor instruction pipeline only supports the big end data format, and the external storage supports the data format of the big end. This embodiment does not specifically limit this.
  • the solution provided by the embodiment of the present application is mainly introduced from the perspective of the data processing method performed by the data processing apparatus.
  • the above data processing apparatus includes a hardware structure and/or a software module corresponding to each function in order to implement the above functions.
  • the present application can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
  • the embodiment of the present application may divide the function module into the data processing device according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 11 shows a possible structural diagram of the data processing apparatus 110 involved in the above embodiment.
  • the data processing apparatus 110 includes an acquisition module 1101, a determination module 1102, a reading module 1103, a shifting module 1104, and a transmitting module 1105.
  • the obtaining module 1101 is configured to support the data processing device 110 to perform step S401 shown in FIG. 4;
  • the determining module 1102 is configured to support the data processing device 110 to perform step S402 shown in FIG. 4;
  • the reading module 1103 is configured to support the data processing device.
  • 110 executes step S403 shown in FIG. 4;
  • the shifting module 1104 is configured to support the data processing apparatus 110 to perform step S404 shown in FIG. 4;
  • the transmitting module 1105 is configured to support the data processing apparatus 110 to perform step S405 shown in FIG.
  • the data processing apparatus 110 may further include a format conversion module 1106 and a write module 1107.
  • the reading module 1103 is further configured to: before reading the second data of 2 Pbytes from the Cache according to the address information in the Cache according to the second data, reading the cacheline containing the first data from the external storage of the Cache.
  • the format conversion module 1106 is configured to perform a size end data format conversion on the cacheline containing the first data to obtain third data, and a 1107 write module to write the third data into the Cache.
  • the reading module 1103 is further configured to support the data processing device 110 to perform step S406 shown in FIG. 6; the shifting module 1104 is further configured to support the data processing device 110 to perform step S407 shown in FIG. 6; the sending module 1105 further It is used to support the data processing device 110 to perform step S408 shown in FIG.
  • the reading module 1103 is further configured to: before the sixth data of 2 P bytes is read from the Cache in the address information in the external storage according to the first data, read from the external storage of the Cache, including the first The cacheline of the data; the writing module 1107 is further configured to write the cacheline containing the first data into the Cache.
  • the obtaining module 1101 is further configured to acquire a write instruction sent by the processor instruction pipeline, where the write instruction includes the eighth data to be written and the number of bytes T of the eighth data, where T ⁇ P, T is an integer .
  • the writing module 1107 is further configured to write the ninth data into the Cache.
  • FIG. 13 is a schematic diagram showing a possible structure of the data processing apparatus involved in the foregoing embodiment.
  • the data processing apparatus 130 includes: a processing module 1301 and a communication module 1302. .
  • the processing module 1301 can be used to perform the operations that can be performed by the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the format converting module 1106, and the writing module 1107 in FIG. 11 or FIG. 12;
  • the operation of the sending module 1105 in FIG. 11 or FIG. 12 can be performed.
  • FIG. 11 or FIG. 12 For details, refer to the embodiment shown in FIG. 11 or FIG. 12 , and details are not described herein again.
  • the data processing device is presented in the form of dividing each functional module corresponding to each function, or the data processing device is presented in a form that divides each functional module in an integrated manner.
  • a “module” herein may refer to a particular ASIC, circuitry, processor and memory that executes one or more software or firmware programs, integrated logic circuitry, and/or other devices that provide the functionality described above.
  • data processing device 110 or data processing device 130 may take the form shown in FIG.
  • the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, and the sending module 1105 in FIG. 11 can be implemented by the processor 301 and the memory 303 of FIG.
  • the reading module 1103, the shifting module 1104, and the sending module 1105 can be executed by the processor 301 to call the application code stored in the memory 303, which is not limited in this embodiment.
  • the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the transmitting module 1105, the format converting module 1106, and the writing module 1107 in FIG. 12 may pass through the processor 301 and the memory 303 of FIG.
  • the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the sending module 1105, the format converting module 1106, and the writing module 1107 can be called by the processor 301 to store the memory stored in the memory 303.
  • the application code is executed, and the embodiment of the present application does not impose any limitation on this.
  • the processing module 1301 and the communication module 1302 in FIG. 13 may be implemented by the processor 301 and the memory 303 of FIG. 3.
  • the processing module 1301 and the communication module 1302 may be called by the processor 301 in the memory 303.
  • the stored application code is executed, and the embodiment of the present application does not impose any limitation on this.
  • the data processing device provided by the embodiment of the present application can be used to perform the foregoing data processing method. Therefore, the technical effects of the present invention can be referred to the foregoing method embodiments.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • a software program it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device that includes one or more servers, data centers, etc. that can be integrated with the media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a Solid State Disk (SSD)) or the like.
  • a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
  • an optical medium eg, a DVD
  • a semiconductor medium such as a Solid State Disk (SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de traitement des données, destinés à être utilisés lors de la réduction de la latence d'accès au chargement, lorsqu'un format de données gros boutiste et le format de données petit boutiste sont mal appariés. Le procédé consiste : à obtenir une instruction de lecture, envoyée par un pipeline d'instructions du processeur, l'instruction de lecture comprenant des informations d'adresse des premières données à lire dans une mémoire externe d'une mémoire cache, la largeur de lecture/écriture de la mémoire cache étant 2P octets et le nombre d'octets des premières données, c'est-à-dire K, étant inférieur ou égal à P; à déterminer, lorsque le format de données pris en charge par la mémoire externe correspond au format de données pris en charge par le pipeline d'instructions du processeur, des informations d'adresse des deuxièmes données dans la mémoire cache, en fonction des informations d'adresse des premières données dans la mémoire externe, les deuxièmes données étant des données dans des troisièmes données et lesdites deuxièmes données correspondant aux premières données et lesdites troisièmes données étant des données obtenues par conversion d'une ligne de mémoire cache comprenant les premières données entre un format de données gros boutiste et le format de données petit boutiste; à lire 2P octets des quatrièmes données à partir de la mémoire cache, en fonction des informations d'adresse des deuxièmes données dans la mémoire cache; à faire tourner les quatrièmes données à droite par un premier octet pour obtenir des cinquièmes données; et à envoyer des deuxièmes données dans les cinquièmes données au pipeline d'instructions du processeur.
PCT/CN2018/077026 2017-03-16 2018-02-23 Procédé et dispositif de traitement de données WO2018166337A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710157711.3A CN108628638B (zh) 2017-03-16 2017-03-16 数据处理方法及装置
CN201710157711.3 2017-03-16

Publications (1)

Publication Number Publication Date
WO2018166337A1 true WO2018166337A1 (fr) 2018-09-20

Family

ID=63521829

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077026 WO2018166337A1 (fr) 2017-03-16 2018-02-23 Procédé et dispositif de traitement de données

Country Status (2)

Country Link
CN (1) CN108628638B (fr)
WO (1) WO2018166337A1 (fr)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109683959B (zh) * 2018-12-24 2020-12-01 安谋科技(中国)有限公司 处理器的指令执行方法及其处理器
CN113157635B (zh) * 2019-09-25 2024-01-05 支付宝(杭州)信息技术有限公司 在fpga上实现合约调用的方法及装置
CN111125715A (zh) * 2019-12-18 2020-05-08 深圳忆联信息系统有限公司 基于固态硬盘的tcg数据处理加速方法、装置、计算机设备及存储介质
CN111258785B (zh) * 2020-01-20 2023-09-08 北京百度网讯科技有限公司 数据洗牌方法和装置
CN113766270B (zh) * 2021-02-26 2024-06-18 北京沃东天骏信息技术有限公司 视频播放方法、系统、服务器、终端设备、以及电子设备
CN113778526B (zh) * 2021-11-12 2022-02-22 北京微核芯科技有限公司 一种基于Cache的流水线的执行方法及装置
CN117093510B (zh) * 2023-05-30 2024-04-09 中国人民解放军军事科学院国防科技创新研究院 大小端通用的缓存行高效索引方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040230765A1 (en) * 2003-03-19 2004-11-18 Kazutoshi Funahashi Data sharing apparatus and processor for sharing data between processors of different endianness
CN102135941A (zh) * 2010-08-26 2011-07-27 华为技术有限公司 从缓存写数据到内存的方法和装置
CN104156323A (zh) * 2014-08-07 2014-11-19 浪潮(北京)电子信息产业有限公司 一种高速缓冲存储器的数据块长度自适应读取方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9489307B2 (en) * 2012-10-24 2016-11-08 Texas Instruments Incorporated Multi domain bridge with auto snoop response

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040230765A1 (en) * 2003-03-19 2004-11-18 Kazutoshi Funahashi Data sharing apparatus and processor for sharing data between processors of different endianness
CN102135941A (zh) * 2010-08-26 2011-07-27 华为技术有限公司 从缓存写数据到内存的方法和装置
CN104156323A (zh) * 2014-08-07 2014-11-19 浪潮(北京)电子信息产业有限公司 一种高速缓冲存储器的数据块长度自适应读取方法及装置

Also Published As

Publication number Publication date
CN108628638B (zh) 2021-02-09
CN108628638A (zh) 2018-10-09

Similar Documents

Publication Publication Date Title
WO2018166337A1 (fr) Procédé et dispositif de traitement de données
US11294675B2 (en) Writing prefetched data into intra-core caches of cores identified by prefetching instructions
TWI463332B (zh) 在單一指令多資料之資料處理器中提供擴充尋址模式
CN110647480A (zh) 数据处理方法、远程直接访存网卡和设备
US20150277867A1 (en) Inter-architecture compatability module to allow code module of one architecture to use library module of another architecture
US7844752B2 (en) Method, apparatus and program storage device for enabling multiple asynchronous direct memory access task executions
US11650754B2 (en) Data accessing method, device, and storage medium
US9063860B2 (en) Method and system for optimizing prefetching of cache memory lines
US10228869B1 (en) Controlling shared resources and context data
WO2020000482A1 (fr) Procédé, appareil, et système de lecture de données reposant sur nvme
US11436146B2 (en) Storage control apparatus, processing apparatus, computer system, and storage control method
US20140143519A1 (en) Store operation with conditional push
US10936517B2 (en) Data transfer using a descriptor
US8359433B2 (en) Method and system of handling non-aligned memory accesses
KR20170073688A (ko) 정렬되지 않은 주소에서 메모리 내의 데이터를 접근하기 위한 방법
CN110781107B (zh) 基于dram接口的低延迟融合io控制方法和装置
WO2019084789A1 (fr) Contrôleur d'accès direct à la mémoire, procédé de lecture de données et procédé d'écriture de données
US20160055107A1 (en) Data processing apparatus and method
US7895387B1 (en) Devices and methods for sharing common target device with two different hosts according to common communication protocol
US10802828B1 (en) Instruction memory
US11275683B2 (en) Method, apparatus, device and computer-readable storage medium for storage management
US9608842B2 (en) Providing, at least in part, at least one indication that at least one portion of data is available for processing
US11487680B2 (en) Apparatus and method for burst mode data storage
US20240311321A1 (en) Multi-core system and reading method
CN118633075A (zh) 一种处理请求的方法、装置及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18766836

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18766836

Country of ref document: EP

Kind code of ref document: A1