CN115269454A - Data access method, electronic device and storage medium - Google Patents

Data access method, electronic device and storage medium Download PDF

Info

Publication number
CN115269454A
CN115269454A CN202210893893.1A CN202210893893A CN115269454A CN 115269454 A CN115269454 A CN 115269454A CN 202210893893 A CN202210893893 A CN 202210893893A CN 115269454 A CN115269454 A CN 115269454A
Authority
CN
China
Prior art keywords
data
line
register
cache
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210893893.1A
Other languages
Chinese (zh)
Inventor
王云贵
郝成龙
王俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202210893893.1A priority Critical patent/CN115269454A/en
Publication of CN115269454A publication Critical patent/CN115269454A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to the technical field of computers, in particular to a data access method, electronic equipment and a storage medium.

Description

Data access method, electronic device and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data access method, an electronic device, and a storage medium.
Background
When the processor receives a data access request, the processor extracts an instruction according to the data requested to be accessed and calculates an access address of the instruction, performs addressing access on the data memory through the access address to determine the storage position of the data in the data memory, and then reads the data and returns a result. For example, when the processor performs addressing access in the cache by using the access address, the access address is divided into a line tag bit, an index bit and a shift bit, which group in the cache corresponding to the access address is determined by the index bit, which cache line in the group corresponding to the access address is determined by the line tag bit, and then the data requested to be accessed by the access address is stored in the cache line is determined by the valid bit before the cache line, and finally the data requested to be accessed by the access address can be read from the cache line by the shift bit and a result is returned.
However, the same cache line is usually used for storing multiple data of the same type or associated data, when the processor receives a data access request for multiple data and the multiple data are the same type or associated data, in the process of performing addressing access in the cache according to access addresses of multiple instructions corresponding to the multiple data, the processor needs to determine whether a cache line is correct through a line tag bit in the access address corresponding to each instruction, and then determines whether a matched cache line is correct through a valid bit before the cache line, that is, multiple accesses to the multiple data of the same cache line need to determine whether the cache line is correct through the valid bit after the cache line, so that the data access efficiency is low.
Disclosure of Invention
The embodiment of the invention provides a data access method, which can read data from a register when electronic equipment receives access requests of a plurality of data of the same cache line stored in a cache, thereby improving the execution speed of a processor and reducing the execution power consumption of the processor.
In a first aspect, the present invention provides a data access method applied to an electronic device, including: the electronic equipment receives a first access request, wherein the first access request requests to access first data of a first access address; judging whether the first data is stored in a register; storing the first data in a register corresponding to the first data, and reading the first data from the register, wherein the first data corresponds to the whole line of data of a first cache line in the buffer to be stored in the register; the first data is read from a second cache line of the register corresponding to the first data not being stored in the register.
In the embodiment of the invention, when the electronic device receives a first access request, a control unit acquires a request instruction corresponding to the first access request and sends the request instruction to an arithmetic unit, the arithmetic unit calculates a first access address of the request instruction and judges whether first data corresponding to the first access address is stored in a register, and if the first data is not stored in the register, a processor reads the first data from a cache; if the first data is stored in the register, the processor reads the first data from the register, and since the register stores the entire line of data of the cache line (the first cache line) corresponding to the first data in the buffer, when the subsequent electronic device receives an access request of a plurality of data stored in the first cache line, the processor can directly read the data from the register.
In a possible implementation of the first aspect, the method further includes: and writing the whole line of data of the second cache line in the buffer into the register corresponding to the first data not being stored in the register.
In the embodiment of the present invention, the whole line of data of the second cache line corresponding to the first data in the cache is written into the register, and when the subsequent electronic device receives an access request of a plurality of data stored in the second cache line, the processor may directly read the data from the register.
In one possible implementation of the first aspect, the method further includes: the whole line of data of the second cache line comprises a plurality of first subdata; the electronic equipment receives access requests for a preset number of times, wherein the access requests for the preset number of times request access to a data set, and the data set comprises a plurality of second subdata; and any second subdata corresponds to one first subdata in the whole line of data of the second cache line.
In an embodiment of the present invention, the preset number of times is used to indicate the number of times that a subsequent electronic device receives an access request for multiple data stored in a second cache line after writing an entire line of data of the second cache line corresponding to the first data in the cache into the register.
In a possible implementation of the first aspect, the writing the entire line of data of the second cache line into the register from the buffer includes: and after the whole line of data of the first cache line is deleted from the register, the whole line of data of the second cache line is written into the register.
In a possible implementation of the first aspect, the determining whether the first data is stored in a register includes: judging whether the line marking bit of the first access address is the same as the line identification of the first cache line or not; if the first data is the same as the second data, storing the first data in a register correspondingly; if not, the corresponding first data is not stored in the register.
In the embodiment of the invention, the processor judges whether the register stores the first data, only needs to judge whether the line marking bit of the access address is the same as the line identification of the first cache line, and if the line marking bit of the access address is the same as the line identification of the first cache line, the register stores the first data.
In a possible implementation of the first aspect, the reading the first data from the second cache line of the cache includes; and reading the first data from the second cache line of the cache after determining that the first data is stored in the second cache line according to the matching of the line marking bit of the first access address and the line identification and the effective bit of the second cache line.
In the embodiment of the present invention, the processor determines whether the buffer stores the first data, and the processor needs to determine that the buffer stores the first data through the processes of group selection, line matching, line determination, and block extraction (see fig. 2 described below), which is complex in determination process, slow in processor execution speed, and high in execution power consumption.
In a second aspect, the present invention provides a computing apparatus for use in an electronic device, the apparatus comprising: the control unit is used for extracting an instruction of the electronic equipment for receiving the access request; the arithmetic unit is used for calculating the access address of the instruction and judging whether the data of the access address is stored in the register or not; and the internal storage unit and/or the external storage unit are used for storing the data of the access address.
In a possible implementation of the second aspect, the method further includes: and storing the corresponding relation between the access address and the register corresponding to the data stored with the access address into the arithmetic unit.
In a third aspect, an embodiment of the present invention provides a readable storage medium, where instructions are stored, and when executed on an electronic device, the instructions cause the electronic device to implement the first aspect and any one of the data access methods provided by various possible implementations of the first aspect.
In a fourth aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes: a memory to store instructions for execution by one or more processors of an electronic device; and a processor, which is one of the processors of the electronic device, configured to execute the instructions stored in the memory to implement the first aspect and any one of the data access methods provided by the various possible implementations of the first aspect.
In a fifth aspect, an embodiment of the present invention provides a program product, which, when run on an electronic device, enables the electronic device to implement any one of the data access methods provided in the foregoing first aspect and the various possible implementations of the foregoing first aspect.
Drawings
FIG. 1 illustrates a block diagram of a computing device 210, according to some embodiments of the invention;
FIG. 2 illustrates a schematic diagram of cache addressing, according to some embodiments of the invention;
FIG. 3 illustrates a block diagram of another computing device 220, in accordance with some embodiments of the invention;
FIG. 4 illustrates a block diagram of an address comparator, according to some embodiments of the invention;
FIG. 5 illustrates a flow diagram of a method of data access, according to some embodiments of the invention;
fig. 6 illustrates a schematic diagram of an electronic device 100, according to some embodiments of the invention.
Detailed Description
Illustrative embodiments of the invention include, but are not limited to, a data access method, an electronic device, and a storage medium.
The technical scheme of the invention is described below with reference to the accompanying drawings.
FIG. 1 illustrates a block diagram of a computing device 210, according to some embodiments of the invention. As shown in fig. 1, the computing device 210 may include a processor 200 and an external storage unit 204, the processor 200 may include a control unit 201, an arithmetic unit 202, an internal storage unit 203, the internal storage unit 203 may include a Cache (Cache) 2031 and a Register (Register) 2032, and the external storage unit 204 may include a main memory, a local memory (e.g., a disk), a remote memory (e.g., a distributed file system, a network server), and the like.
When the processor 200 receives a data access request, the control unit 201 obtains a request instruction (hereinafter referred to as an instruction) corresponding to data from a corresponding instruction register and sends the request instruction to the arithmetic unit 202, and then the arithmetic unit 202 calculates an access address of the data requested to be accessed by the instruction, and the processor 200 can perform addressing access to the internal storage unit 203 or the external storage unit 204 according to the access address to read the data and return a result. Because the speed of the processor 200 directly accessing the external storage unit 204 is slow, data historically accessed by the processor 200 or data predicted to be accessed by the processor 200 can be stored in the cache 2031, so that the data can be directly accessed from the cache 2031, and the speed of the processor 200 accessing the data is improved. Furthermore, when the control unit 201 obtains the instruction and calculates the access address of the instruction through the operation unit 202, the processor 200 may first perform address access in the cache 2031, and when the cache is hit, the processor 200 may directly read data from the cache 2031 and return the data, and when the cache 2031 is not hit, the processor 200 may read data from the external storage unit 204 and return the data, and simultaneously store the data read by the external storage unit 204 in the cache 2031, so that the subsequent instruction with the same data may directly access the data from the cache 2031. It is understood that the speed at which the processor 200 accesses data from the cache 2031 is greater than the speed at which the processor 200 accesses data from the external storage unit 204.
In some embodiments of the present invention, when data is stored in the cache 2031, multiple data of the same type or related data may be stored in the same cache line (cache line), for example, if the data types of multiple data included in an array are definitely the same, the array may be stored in the same cache line when stored in the cache 2031.
Further, when the processor 200 receives a data access request for multiple data, and the multiple data are of the same type or related data, the control unit 201 may correspondingly extract multiple instructions from the instruction register according to the multiple data requested to be accessed, and then send the multiple instructions to the arithmetic unit 202 to obtain multiple access addresses, and the processor 200 performs addressing access on the internal storage unit 203 or the external storage unit 204 according to the multiple access addresses to return a result. For example, the control unit 201 may generate an instruction queue according to an order of fetching a plurality of instructions, sequentially send the plurality of instructions to the arithmetic unit 202 according to the instruction queue, sequentially calculate an access address, and sequentially perform an addressing access to the internal storage unit 203 or the external storage unit 204 according to the obtained access address to return a result. It is understood that a plurality of instructions may sequentially and continuously read data from the internal storage unit 203 or the external storage unit 204 in the order of the instruction queue and return the result.
In some embodiments of the present invention, an array may include a plurality of data, and the plurality of data included in an array are the same type or associated data. Further, when the processor 200 receives a data access request of an array, if the array is stored in the cache 2031, a plurality of data included in the array are necessarily stored in the same cache line of the cache 2031, and when the processor 200 performs address access in the cache 2031 according to a plurality of instructions corresponding to the plurality of data included in the array, the plurality of instructions may sequentially and continuously perform address access in the cache 2031 in a pipeline manner, and each time the plurality of instructions perform address access in the cache 2031, the corresponding data may be read and a result may be returned after the cache line is matched.
However, due to the complex storage structure of the cache 2031 (the storage structure of the cache 2031 will be described in detail later), matching a cache line requires determining whether the matched cache line is correct or not by the line tag bit in the access address of the instruction, and then determining whether the cache line actually stores the data requested to be accessed by the access address. When a plurality of instructions corresponding to a plurality of data included in the above-mentioned array read data in the cache 2031, each instruction needs to determine the cache line first by the line tag bit in the access address of the instruction, and then determines whether the matched cache line is correct by the valid bit before the cache line, and since a plurality of data included in one array must be stored in the same cache line, it is equivalent to that whether the cache line is correct by the valid bit after determining the cache line every time of continuous access of a plurality of data in the same cache line, and the data access efficiency is low.
It should be particularly noted that the present invention is described by taking a plurality of data included in one array as an example of a plurality of data of the same type or associated data, and in other embodiments, a plurality of data of the same type or associated data may also be other data, and the present invention is not limited herein.
Therefore, when a plurality of instructions request to access data to be stored in the same cache line, the processor 200 may write the data in the cache line into the register 2032 when a first instruction cache of the plurality of instructions hits, and since the data requested to be accessed by the plurality of instructions is stored in the same cache line, a subsequent instruction may directly access the data of the cache line already stored in the register 2032 to return a result, and since the delay of accessing the data from the register 2032 by the processor 200 is lower than the delay of accessing the data from the register 2031, directly reading the data from the register 2032 to return the result may increase the execution speed of the processor 200.
Some embodiments of the invention are described below using the example where the processor 200 receives a data access request for an array stored in the cache 2031.
In some embodiments of the present invention, after the processor receives a data access request of an array, the control unit 201 sequentially extracts instructions from the instruction register according to an order of a plurality of data in the array, and sequentially sends the instructions to the arithmetic unit 202 to calculate access addresses of corresponding instructions, and the processor 200 reads data according to the sequentially obtained access addresses to return a result. Because a plurality of data included in one array are stored in the same cache line, the data of the cache line is synchronously stored in the register 2032 while the result is returned by performing addressing access on the read data in the cache 2031 according to the access address of the first instruction, so that the subsequent instruction can directly read the data from the register 2032 without being determined by the line marker bit.
The process of the first instruction performing the addressing access in the cache 2031 is as follows:
in some embodiments of the present invention, the data storage structure of the cache 2031 may be divided into groups, lines, and blocks, it may be understood that the storage structure of the cache 2031 may be divided into S cache 2031 groups, one group includes E cache lines, one cache line includes B storage blocks, and the data requested to be accessed is stored in the storage blocks, and each group, line, and block has a corresponding identifier, and the storage location of the data stored in the cache 2031 is represented by the identifier of the group, line, and block. Meanwhile, when performing address access in the cache 2031 according to an access address of the instruction, the processor 200 may divide the access address into a line tag bit (tag), an index bit (index), and an offset bit (offset), match an identifier of a group in the cache 2031 according to the index bit, match an identifier of a line in the group according to the line tag bit, match an identifier of a block in the line according to the offset bit, and further match a storage location of data corresponding to the access address in the cache 2031 according to the access address. It should be noted that the cache 2031 may be a multi-level cache 2031, such as a first-level cache, a second-level cache, a third-level cache, and the like, when determining whether the cache 2031 is hit, first determining whether the cache is hit in the first-level cache, if the first-level cache is not hit, then sequentially determining whether the cache is hit in the second-level cache and the third-level cache, and when the cache is not hit in the first-level to third-level caches, finally reading data in the external storage unit 204 according to the access address and returning a result. Some embodiments of the present invention will be described below by taking the cache 2031 with only one level of cache as an example.
As shown in fig. 2, for example, the storage structure of the cache 2031 may include two sets, a set a and a set B, one set including three rows, e.g., the set a includes rows A1, A2, A3, and one row including four blocks, e.g., the row A2 includes blocks a, B, c, d. Here, it can be understood that the group is identified by A, B, the rows are identified by A1, A2, A3, and the blocks are identified by a, b, c, d. The access address of the first instruction is an access address C, when the access address C performs an addressing access in the cache 2031, the processor 200 divides the access address C into a row flag bit, an index bit and an offset bit, the processor 200 determines which group of the cache 2031 the access address C corresponds to through the index bit, determines which row of the group of the access address corresponds to through the row flag bit, determines, through the valid bit before the row, which block of the row the access address corresponds to, and finally determines which block of the row the access address corresponds to through the offset bit, and after determining the block, the data of the access address can be read from the block and returned. It can be understood that the determination of the data requested to be accessed by the access address by using the valid bit is stored in the line, so as to determine in advance that the data requested to be accessed by the access address is stored in the line, and avoid the situation that the line tag bit can match the line, but the offset bit cannot match the block, so the determination of the process of storing the data requested to be accessed by the access address in the line by using the valid bit of the line is that the data requested to be accessed by the access address can be matched to the block, and thus the result of the addressing of the cache 2031 after the valid bit is determined can be referred to as a cache hit, that is, the data requested to be accessed by the access address is stored in the cache 2031, and the cache hit is common knowledge in the art, and therefore details are not described herein.
Specifically, the processor 200 may determine that the access address C corresponds to the group a in the cache 2031 by accessing the index bit in the address C, and determine the row by matching the row flag bit with the rows A1, A2, and A3 in the group a after determining the group. The matching between the row flag bit in the access address and the row in the cache 2031 may be determined by a previous valid bit of the row identifier, where a valid bit may be 1 or 0, if the row flag bit matches the row correctly, the valid bit is 1, the data representing the access address C is correspondingly stored in the cache 2031, and if the row flag bit fails to match the row, the valid bit is 0, the data representing the access address C is not stored in the cache 2031, and the external storage unit 204 needs to be accessed. As shown in fig. 2, when the access address C matches with the row A2 in the group a through the row flag bit, the valid bit is 1, which indicates that the row flag bit matches with the row correctly, i.e. it is determined that the data of the access address C is stored in the cache 2031, and finally it is determined through the offset bit of the access address C which block of the row A2 the data of the access address C is stored in. As shown in fig. 2, the data of access address C is stored in block C in row A2 in group a as determined by the offset bit of access address C, and thus, the processor 200 may read the data of access address C from group a, row A2, block C in the buffer 2031 and return the result. It can be understood that, when the processor 200 performs an addressing access in the cache 2031, at least the processes of group selection, line matching, line determination, and block extraction are required to determine the storage location of the data of the access address in the cache 2031 and then read the data back, and the addressing process is complicated.
In some embodiments of the present invention, while the first instruction of the data access request of one array is addressed in the cache 2031 to read the data and return the result, the data of the corresponding line (hereinafter, the line is referred to as cache line) hit when the first instruction cache 2031 is addressed is stored in the register 2032, so that the subsequent instruction of the data access request of one array can directly read the data from the register 2032 without accessing the cache 2031.
FIG. 3 illustrates a block diagram of another computing device 220, according to some embodiments of the invention. As shown in fig. 3, the arithmetic unit 202 for calculating an instruction address in the apparatus 220 may include an address comparator 2021, where the address comparator 2021 is configured to determine whether an access address of data requested to be accessed by the instruction calculated by the arithmetic unit 202 is a target access address.
In some embodiments of the present invention, the operation unit 202 may be provided with an address comparator, the line id of the cache line stored in the register 2032 is stored in the address comparator, when the operation unit 202 calculates the access address of the instruction, the address comparator determines whether the access address matches with the line id of the cache line stored in the address comparator, that is, the line tag bit in the access address matches with the line id of the cache line stored in the address comparator, if so, the data requested to be accessed by the instruction is stored in the register 2032, and the result may be directly read from the register 2032 and returned. And because a plurality of data included in one array must exist in the same cache line, that is, the line tag bits in the access address of the instruction corresponding to a plurality of data included in one array must be the same, after the first instruction cache hits, the data corresponding to the cache line is stored in the register 2032 and the line tag of the cache line is stored in the address comparator, then when the subsequent instruction of the first instruction obtains the access address by calculation, the address comparator can judge that the line tag bit in the access address must be matched with the line tag stored in the address comparator, that is, the data requested to be accessed by the subsequent instruction is stored in the register 2032, the cache 2031 is not required to be accessed, and the data is directly read from the register 2032 and the result is returned.
In some embodiments of the present invention, when data of one cache line is stored by a plurality of registers 2032, for example, the data size of one cache line is 256 bytes, and the data of the cache line can be stored by four 64-byte registers 2032, when storing a line identifier of the cache line in the compare addressor, a correspondence between a part of data of the cache line stored by each register 2032 and the register 2032 identifier of each register 2032 is stored in the compare addressor, so that a subsequent instruction is quickly matched to which register 2032 the data requested to be accessed by the instruction is stored by the correspondence.
For example, as shown in fig. 2, if the data size of the cache line A2 is 256 bytes, the block sizes of a, b, c, and d in the corresponding cache line A2 are 64 bytes, and if the data of the cache line A2 is stored by four registers 2032 of 64 bytes, data of one block can be stored in each register 2032, so that the block identifier and the register 2032 identifier of the register 2032 storing the data of the block can be stored in the address comparator in a corresponding relationship. As shown in fig. 2 and fig. 4, after the first instruction cache hits, the line identifier A2 of the hit cache line is stored in the address comparator, the data of the cache line is divided into four blocks and stored in the registers 401 to 404, and meanwhile, the block identifiers a to d of the blocks and the identifier (for example, 401, 402, 403, 404) of the register 2032 in which the data of the block is stored are in a corresponding relationship and stored in the address comparator. After determining that the row flag bit in the access address matches the row flag stored in the address comparator, the block flag may be matched by the offset bit in the access address, and then the register 2032 storing the data of the block flag may be matched by the correspondence. Therefore, whether the cache line stores the data corresponding to the request access or not is determined without the need of determining whether the cache line stores the data corresponding to the request access through the valid bit before the line identifier, and the data corresponding to the request access can be determined to be stored in the register 2032 after the address comparator matches the line marker bit with the line identifier, so that the speed of the request access is improved.
In some embodiments of the present invention, after the data access request of one array is completed, when the processor 200 receives the data access request of another array, the processor may update the cache line data hit after the first instruction cache of the other array hits into the register 2032, and update the line identifier and the corresponding relationship in the address comparator synchronously, so that the subsequent instruction of the first instruction of the other array may read the data from the register 2032 and return the result, thereby improving the data access efficiency.
The following is described in conjunction with the structure of the computing apparatus 220 shown in fig. 3 described above.
FIG. 5 illustrates a flow diagram of a method of data access, according to some embodiments of the invention. As shown in fig. 5, the method flow includes the following steps:
s501: when receiving a data access request of a plurality of data, the processor 200 correspondingly extracts a plurality of instructions and calculates access addresses of the plurality of instructions; wherein multiple data are stored in the same cache line.
In some embodiments of the present invention, when the processor 200 receives a data access request of multiple data, the control unit 201 extracts an instruction from the instruction register according to the data requested to be accessed and sends the instruction to the arithmetic unit 202, and then the arithmetic unit 202 calculates an access address of the data requested to be accessed by the instruction, and the processor 200 may access the cache 2031 through the access address to read the data for return. For example, if the data types of the data included in an array are necessarily the same, the array will be stored in the same cache line when stored in the cache 2031.
S502: the address comparator 2021 determines whether the data requested to be accessed by the access address is present in the register 2032.
If the address comparator 2021 determines that the data requested to be accessed by the access address is stored in the register 2032, go to step S502B; if not, go to step S502A.
In some embodiments of the present invention, since multiple data are stored in the same cache line, that is, the line tag bits in the access addresses of the multiple data are the same, and if the access addresses of multiple instructions need to read and return data through the access cache 2031 each time, the storage location of the data of the access address in the cache 2031 needs to be determined through the processes of group selection, line matching, line determination, and block extraction each time to read and return the data (as shown in fig. 2), the data access efficiency is slow. Therefore, when a first instruction cache of the multiple instructions hits, the data of the hit cache line may be stored in the register 2032, and multiple data requested to be accessed are stored in the same cache line, that is, it may be understood that a subsequent instruction of the first instruction may directly read data from the register 2032 without accessing the cache 2031, and meanwhile, since the access data delay of the processor 200 from the register 2032 is lower than the access data delay from the cache 2031, the data return structure of directly reading data from the register 2032 may improve the data access speed of the processor 200.
S502A: processor 200 reads the data return from cache 2031.
In some embodiments of the present invention, step S502A may be understood as a data access step of a first instruction of the multiple instructions, and the specific data access process may refer to the description in fig. 2, and a result is returned by reading data after group selection, line matching, line determination, and block extraction of a line marking bit, an index bit, and an offset bit in the access address in the cache 2031, which is not described herein again.
S502B: processor 200 reads the data return from register 2032.
In some embodiments of the invention, step S502B may be understood as an instruction subsequent to the first instruction. The judgment whether the data requested to be accessed by the access address exists in the register 2032 can be performed by storing the line identifier of the hit cache line into the address comparator when the first instruction hits in the cache, when the arithmetic unit accesses the address, the address comparator judges whether the access address matches with the line identifier of the cache line stored in the address comparator, that is, by matching the line tag bit in the access address with the line identifier of the cache line stored in the address comparator, if so, the data requested to be accessed by the instruction is identified to be stored in the register 2032, and the data can be directly read from the register 2032 to return a result.
By the method provided by the embodiment of the present invention, when the processor 200 receives a plurality of instructions requesting to access data and store the data in the same cache line, the processor 200 may write the data in the cache line into the register 2032 and store the line identifier of the cache line into the address comparator when the first instruction cache hits, determine whether the line marker bit in the access address of the subsequent instruction matches with the line identifier in the address comparator by the address comparator, to determine whether the data requested to be accessed by the subsequent instruction exists in the register 2032, and because the processor 200 receives a plurality of instructions requesting to access data and stores the data in the same cache line, the subsequent instruction may directly access the register 2032 to read the data and return the result, thereby increasing the data access speed of the processor 200.
Further, fig. 6 illustrates a schematic structural diagram of an electronic device 100, according to some embodiments of the present application. As shown in fig. 6, electronic device 100 includes one or more processors 200, a system Memory 102, a Non-Volatile Memory (NVM) 103, a communication interface 104, an input/output (I/O) device 105, and system control logic 106 for coupling the processors, the system Memory 102, the NVM 103, the communication interface 104, and the input/output (I/O) device 105. Wherein:
the Processor 200 may include one or more Processing units, for example, processing modules or Processing circuits that may include a Central Processing Unit (CPU), an image Processing Unit (GPU), a Digital Signal Processor (DSP), a Micro-programmed Control Unit (MCU), an Artificial Intelligence (AI) Processor or a Programmable logic device (FPGA), a Neural Network Processor (NPU), and the like may include one or more single-core or multi-core processors. In some embodiments, the CPU may include a control unit 201, an arithmetic unit 202, and an internal storage unit 203, where the control unit 201 receives a data access instruction and sends the data access instruction to the arithmetic unit 202 to calculate an access address, and data addressing access is performed in the internal storage unit 203 through the access address.
The system Memory 102 is a volatile Memory, such as a Random-Access Memory (RAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like. System memory 102 is used to temporarily store data and/or instructions, e.g., in some embodiments, system memory 102 may be used to store instructions that processor 200 addresses to external storage unit 204.
Non-volatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory 103 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as a Hard Disk Drive (HDD), compact Disc (CD), digital Versatile Disc (DVD), solid-State Drive (SSD), and the like. In some embodiments, the non-volatile memory 103 may also be a removable storage medium, such as a Secure Digital (SD) memory card or the like.
In particular, system memory 102 and non-volatile storage 103 may each include: a temporary copy and a permanent copy of instruction 107. The instructions 107 may include: when executed by at least one of the processors 200, cause the electronic device 100 to implement the data access methods provided by the embodiments of the present application.
The communication interface 104 may include a transceiver to provide a wired or wireless communication interface for the electronic device 100 to communicate with any other suitable device over one or more networks. In some embodiments, the communication interface 104 may be integrated with other components of the electronic device 100, for example the communication interface 104 may be integrated in the processor 200. In some embodiments, electronic device 100 may communicate with other devices through communication interface 104.
An input/output (I/O) device 105 may be an input device such as a keyboard, mouse, etc., an output device such as a display, etc., and a user may interact with the electronic device 100 through the input/output (I/O) device 105, such as inputting a neural network model to be run, etc.
System control logic 106 may include any suitable interface controllers to provide any suitable interfaces with other modules of electronic device 100. For example, in some embodiments, system control logic 106 may include one or more memory controllers to provide an interface to system memory 102 and non-volatile memory 103.
In some embodiments, at least one of the processors 200 may be packaged together with logic for one or more controllers of the System control logic 106 to form a System In Package (SiP). In other embodiments, at least one of the processors 200 may also be integrated on the same Chip with logic for one or more controllers of the System control logic 106 to form a System-on-Chip (SoC).
It is understood that the configuration of electronic device 100 shown in fig. 6 is merely an example, and in other embodiments, electronic device 100 may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
It is understood that the electronic device 100 may be any electronic device, including but not limited to a mobile phone, a wearable device (e.g., a smart watch, etc.), a tablet, a desktop, a laptop, a handheld computer, a notebook, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR)/Virtual Reality (VR) device, etc., and the embodiments of the present application are not limited thereto.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this Application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-Only memories (CD-ROMs), magneto-optical disks, read-Only memories (ROMs), random Access Memories (RAMs), erasable Programmable Read-Only memories (EPROMs), electrically Erasable Programmable Read-Only memories (EEPROMs), magnetic or optical cards, flash Memory, or a tangible machine-readable Memory for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodological feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments may not be included or may be combined with other features.
It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned embodiments of the apparatus of the present application do not introduce units/modules that are not so closely related to solve the technical problems proposed by the present application, which does not indicate that there are no other units/modules in the above-mentioned embodiments of the apparatus.
It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the application.

Claims (10)

1. A data access method is applied to electronic equipment and is characterized by comprising the following steps:
the electronic equipment receives a first access request, wherein the first access request requests to access first data of a first access address;
judging whether the first data is stored in a register or not;
reading the first data from the register corresponding to the first data stored in the register, wherein the first data corresponds to an entire line of data of a first cache line in a cache stored in the register;
reading the first data from a second cache line of the cache corresponding to the first data not being stored in the register.
2. The method of claim 1, further comprising:
writing an entire line of data of the second cache line in the buffer into the register corresponding to the first data not being stored in the register.
3. The method of claim 2, further comprising:
the whole line of data of the second cache line comprises a plurality of first subdata;
the electronic equipment receives access requests for preset times, wherein the access requests for the preset times request for accessing a data set, and the data set comprises a plurality of second subdata;
any one of the second sub-data corresponds to one of the first sub-data in the whole line of data of the second cache line.
4. The method of claim 2, wherein writing the entire line of data of the second cache line from the buffer into the register comprises:
and after the whole line of data of the first cache line is deleted from the register, the whole line of data of the second cache line is written into the register.
5. The method of claim 1, wherein said determining whether the first data is stored in a register comprises:
judging whether the line marking bit of the first access address is the same as the line identification of the first cache line;
if the first data are the same, storing the first data in the register correspondingly;
if not, the first data is not stored in the register.
6. The method of claim 1, wherein the reading the first data from the second cache line of the cache comprises;
and reading the first data from the second cache line of the buffer after determining that the first data is stored in the second cache line according to the matching of the line marking bit of the first access address and the line identifier and the effective bit of the second cache line.
7. A computing device applied to an electronic device, the device comprising:
the control unit is used for extracting an instruction of the electronic equipment for receiving the access request;
the arithmetic unit is used for calculating an access address of the instruction and judging whether data of the access address is stored in a register or not;
and the internal storage unit and/or the external storage unit are used for storing the data of the access address.
8. The apparatus of claim 7, further comprising:
and storing the corresponding relation between the access address and a register which is correspondingly stored with the data of the access address into the arithmetic unit.
9. An electronic device, comprising:
a memory to store instructions for execution by one or more processors of an electronic device;
and a processor, which is one of the processors of the electronic device, for executing the instructions stored in the memory to implement the method of any one of claims 1 to 6.
10. A computer-readable storage medium having instructions stored thereon, which when executed on an electronic device, cause the electronic device to implement the method of any one of claims 1 to 6.
CN202210893893.1A 2022-07-27 2022-07-27 Data access method, electronic device and storage medium Pending CN115269454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210893893.1A CN115269454A (en) 2022-07-27 2022-07-27 Data access method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210893893.1A CN115269454A (en) 2022-07-27 2022-07-27 Data access method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN115269454A true CN115269454A (en) 2022-11-01

Family

ID=83769978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210893893.1A Pending CN115269454A (en) 2022-07-27 2022-07-27 Data access method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115269454A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132446A (en) * 2023-05-26 2023-11-28 摩尔线程智能科技(北京)有限责任公司 GPU data access processing method, device and storage medium
CN117217977A (en) * 2023-05-26 2023-12-12 摩尔线程智能科技(北京)有限责任公司 GPU data access processing method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132446A (en) * 2023-05-26 2023-11-28 摩尔线程智能科技(北京)有限责任公司 GPU data access processing method, device and storage medium
CN117217977A (en) * 2023-05-26 2023-12-12 摩尔线程智能科技(北京)有限责任公司 GPU data access processing method, device and storage medium

Similar Documents

Publication Publication Date Title
US11237728B2 (en) Method for accessing extended memory, device, and system
US9858192B2 (en) Cross-page prefetching method, apparatus, and system
CN108459826B (en) Method and device for processing IO (input/output) request
US9690953B2 (en) Generating efficient reads for a system having non-volatile memory
CN115269454A (en) Data access method, electronic device and storage medium
US9928166B2 (en) Detecting hot spots through flash memory management table snapshots
US7472227B2 (en) Invalidating multiple address cache entries
US20240143219A1 (en) Software-hardware combination method for internal mapping address query of zoned namespace
CN107818053A (en) Method and apparatus for accessing cache
US10459662B1 (en) Write failure handling for a memory controller to non-volatile memory
WO2018057273A1 (en) Reusing trained prefetchers
US11048644B1 (en) Memory mapping in an access device for non-volatile memory
CN106445472B (en) A kind of character manipulation accelerated method, device, chip, processor
US9727474B2 (en) Texture cache memory system of non-blocking for texture mapping pipeline and operation method of texture cache memory
US10534562B2 (en) Solid state drive
CN107783909B (en) Memory address bus expansion method and device
CN106649143B (en) Cache access method and device and electronic equipment
US11836092B2 (en) Non-volatile storage controller with partial logical-to-physical (L2P) address translation table
US11645209B2 (en) Method of cache prefetching that increases the hit rate of a next faster cache
CN115269199A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN115033500A (en) Cache system simulation method, device, equipment and storage medium
US20210064368A1 (en) Command tracking
US10977176B2 (en) Prefetching data to reduce cache misses
US20230112575A1 (en) Accelerator for concurrent insert and lookup operations in cuckoo hashing
CN114358180A (en) Pre-fetch training method of processor, processing device, processor and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination