WO2015172718A1 - 在存储器中进行多访问的方法、装置和存储系统 - Google Patents

在存储器中进行多访问的方法、装置和存储系统 Download PDF

Info

Publication number
WO2015172718A1
WO2015172718A1 PCT/CN2015/078863 CN2015078863W WO2015172718A1 WO 2015172718 A1 WO2015172718 A1 WO 2015172718A1 CN 2015078863 W CN2015078863 W CN 2015078863W WO 2015172718 A1 WO2015172718 A1 WO 2015172718A1
Authority
WO
WIPO (PCT)
Prior art keywords
addresses
address
result
memory
predetermined condition
Prior art date
Application number
PCT/CN2015/078863
Other languages
English (en)
French (fr)
Inventor
陈文光
郑纬民
Original Assignee
清华大学
陈文光
郑纬民
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学, 陈文光, 郑纬民 filed Critical 清华大学
Priority to EP22213885.1A priority Critical patent/EP4180972A1/en
Priority to EP15792922.5A priority patent/EP3144817A4/en
Priority to US15/310,984 priority patent/US10956319B2/en
Priority to JP2017512089A priority patent/JP6389323B2/ja
Publication of WO2015172718A1 publication Critical patent/WO2015172718A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/345Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results

Definitions

  • Embodiments of the present invention relate to a method for performing multiple accesses in a memory, an apparatus for supporting multiple accesses in a memory, and a storage system, and more particularly to a multi-access in a memory capable of improving access performance of a memory.
  • a method, a device that supports multiple accesses in memory, and a storage system are particularly preferred.
  • Random memory access has always been an important factor affecting computer performance.
  • access to a DRAM requires hundreds of clock cycles.
  • Computer system architectures and programming languages have been using methods such as Cache (cache), prefetching, etc. to minimize random access to DRAM or to reduce the impact of random access on performance.
  • Embodiments of the present invention provide a method for performing multiple access in a memory, a device for supporting multiple access in a memory, and a storage system, which can improve access performance of the computer system.
  • a method for performing multiple accesses in a memory comprising: receiving N addresses in a memory, wherein N is an integer greater than 1 and the N addresses are non-contiguous; The predetermined operation is performed according to the N addresses; and the result of the output operation.
  • an apparatus for supporting multiple access in a memory comprising: a receiving unit, configured to receive N addresses in a memory, where N is an integer greater than 1 and the N The addresses are non-contiguous; a processing unit for performing a predetermined operation according to the N addresses; and an output unit for outputting the result of the operation.
  • a storage system comprising the apparatus for supporting multiple accesses in a memory as previously described.
  • multiple addresses in memory can be operated, and these addresses can be either continuous or non-contiguous, which allows the desired address to be entered and used just as desired by the user.
  • the predetermined operation can be performed inside the memory according to the input address and the result of the operation is output, not only the function of the memory is expanded, but also the speed of data processing is improved, saving time.
  • FIG. 1 is a schematic flow chart showing a method for performing multiple accesses in a memory according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart showing a method for performing multiple accesses in a memory according to another embodiment of the present invention
  • Figure 3 is a schematic diagram showing the data structure of a diagram
  • FIG. 4 is a schematic flowchart showing a method for performing multiple accesses in a memory when performing predetermined operations on data stored at N addresses, according to an embodiment of the present invention
  • FIG. 5 is a schematic flowchart showing a method for performing multiple accesses in a memory when performing a predetermined operation on data stored at N addresses according to another embodiment of the present invention
  • FIG. 6 is a schematic flowchart showing a method for performing multiple accesses in a memory when a predetermined operation is performed on N addresses according to still another embodiment of the present invention
  • FIG. 7 is a schematic block diagram showing an apparatus for supporting multiple access in a memory according to an embodiment of the present invention.
  • FIG. 8 is a schematic block diagram showing an apparatus for supporting multiple access in a memory according to another embodiment of the present invention.
  • the embodiments of the present invention can also be applied to other storage devices and storage systems, such as SRAM (static random access memory), PCM (Phase Change Memory), FRAM ( Ferroelectric memory) and so on.
  • SRAM static random access memory
  • PCM Phase Change Memory
  • FRAM Ferroelectric memory
  • FIG. 1 is a schematic flow diagram showing a method 100 for multiple accesses in a memory in accordance with an embodiment of the present invention.
  • the method 100 can be performed in a processor.
  • N addresses in memory are received, where N is an integer greater than one and the N addresses are non-contiguous.
  • a predetermined operation is performed based on the N addresses.
  • multiple addresses in memory can be operated, and these addresses can be either continuous or non-contiguous, which allows the desired address to be entered and used just as desired by the user.
  • the predetermined operation can be performed inside the memory according to the input address and the result of the operation is output, not only the function of the memory is expanded, but also the speed of data processing is improved, saving time.
  • the buffer may be utilized to store the intermediate result and executed when all the addresses are executed before the predetermined operation performed on all the addresses ends.
  • the address in the output buffer at the end of the scheduled operation As a result, this will further increase the access speed.
  • FIG. 2 is a schematic flow diagram showing a method 200 for multiple accesses in a memory in accordance with another embodiment of the present invention.
  • the method 200 can be performed in a memory.
  • N addresses in memory are received, where N is an integer greater than one and the N addresses are non-contiguous.
  • a predetermined operation is performed based on the N addresses.
  • the result of the operation is stored in a buffer within the memory.
  • the result of the output operation in method 100 is specifically: the address in the output buffer as a result.
  • the access speed can be further improved.
  • the time overhead due to the handshake signal is large, It takes about 60% of the total time, so the total time it takes is about a few hundred nanoseconds (for example, 200ns), and when the buffer is used, the same data is stored inside the memory because the handshake signal is greatly reduced.
  • the total time taken to make an access can be shortened to tens of nanoseconds or even a few nanoseconds, such as 1-2 ns.
  • the buffer can be configured in a structure similar to a cache to further increase access speed. Therefore, when the buffer is utilized, the access time can be shortened.
  • the buffer can be newly divided from the original buffer domain in the memory, or it can be a new buffer area added to the memory. In the latter case, it may be necessary to improve the hardware of the memory.
  • the data in the buffer can be emptied after each output of the result.
  • Figure 3 is a schematic diagram showing the data structure of a diagram. Although an undirected graph is shown in FIG. 3, it will be apparent to those skilled in the art that it can also be a directed graph and can also include weight information.
  • the figure includes 8 vertices V0, V1, V2, V3, V4, V5, V6, and V7 and 11 edges.
  • a one-dimensional array is used to store vertex data and a two-dimensional array is used.
  • Store side data is used.
  • Vertex data can include a variety of information.
  • the vertex data when performing the traversal of the graph, can indicate whether the vertex has been traversed, for example, 0 means not traversed and 1 means traversed.
  • the vertex data in an application to a vertex level, can represent the number of nodes that the vertex is relative to the currently designated center vertex.
  • the embodiments of the present invention are not limited thereto, and those skilled in the art may understand that the vertex data may also include any other suitable information.
  • any two vertices can be associated with each other, so that when accessing the data in the graph using the memory, the order of accessing the vertices is impossible.
  • it has strong randomness and is difficult to cache, resulting in slower access.
  • it may be accessed in the order of V2 ⁇ V7 in a certain operation, and may be accessed in the order of V3 ⁇ V7 in the next operation, and In another operation, it is possible to access in the order of V5 ⁇ V7.
  • each address may be determined by a base address (base_address) and an offset, where the offset indicates the distance of the address from the base address.
  • a plurality of offsets may be defined in the form of an array, such as bias[i], where i is an integer and 0 ⁇ i ⁇ N-1.
  • the number of the vertex may be used as the offset.
  • the offset can be calculated by multiplying the address index by the size of the address element to further determine the address of the vertex.
  • address index and address element size can be convenient for the user to operate, because in most cases, the user cannot know the exact address of the vertex, but the number of each vertex can be known. Therefore, by using the correspondence between the number of the vertex and its address index, the actual address can be determined quickly and conveniently. This greatly reduces the time required to enter data compared to the scheme in which the actual address needs to be entered, and in such a way that the user can see if the input vertices are correct and is user friendly.
  • discontinuous is broad and includes not only absolute discrete vertices or addresses, such as the above four vertices V0, V2, V4, and V7, but may also include partial contiguous Vertices or addresses, such as vertices V0 to V4 and vertices V7, have a total of 5 vertices.
  • N addresses are input in ascending order in the above example, since the input multiple addresses may be non-contiguous, the multiple addresses may be input in any order without necessarily following The order of increment or decrement.
  • performing the predetermined operation according to the N addresses in 120 or 220 may include performing a predetermined operation on the data stored at the N addresses.
  • FIG. 4 is a schematic flow diagram showing a method 400 for multiple accesses in memory when performing predetermined operations on data stored at N addresses, in accordance with an embodiment of the present invention.
  • the method 400 can be performed in a memory.
  • N addresses in a memory to be accessed are received, where N is an integer greater than one and the N addresses are non-contiguous.
  • each of the N addresses is accessed and a determination is made as to whether the data stored at the address satisfies a predetermined condition.
  • one or more of the N addresses satisfying a predetermined condition are output as a result.
  • the N addresses may be non-contiguous. Of course, those skilled in the art can understand that these N addresses can also be continuous.
  • multiple addresses in the memory can be accessed, and the returned results include one or more addresses that meet the criteria, thus, as compared to conventional operations that can only access one address at a time,
  • the access speed is greatly improved, and thereby the access performance of the computer system can be improved.
  • the addresses to be accessed can be either contiguous or non-contiguous, which makes it possible to access the desired address just as the user desires. Further, since it is possible to judge whether or not the data at each address satisfies the condition inside the memory, the time for input/output is saved, and the processing speed is improved.
  • a buffer may be utilized to store intermediate results before the end of access to all addresses, and when all addresses are The address in the output buffer at the end of the access is used as a result to further increase the access speed.
  • FIG. 5 is a schematic flow diagram showing a method 500 for multiple accesses in memory when performing predetermined operations on data stored at N addresses, in accordance with another embodiment of the present invention.
  • the method 500 can be performed in a memory.
  • N addresses in the memory to be accessed are received, where N is an integer greater than one.
  • each of the N addresses is accessed and a determination is made as to whether the data stored at the address satisfies a predetermined condition.
  • the address of the data satisfying the predetermined condition is stored in a buffer in the memory before the access to all of the N addresses is completed.
  • one or more of the N addresses that satisfy the predetermined condition are output at 530 as a result specifically: an address in the output buffer as a result.
  • the buffer is further improved by configuring the buffer The speed of access. Therefore, when a buffer is used to access multiple addresses in memory, the access time is shortened.
  • a method for accessing in memory when performing a predetermined operation on data stored at N addresses may Accessing data in these addresses to determine if the data stored at the address satisfies a predetermined condition.
  • the predetermined conditions herein may be arbitrarily specified by the user according to actual needs.
  • the predetermined condition in the case of a traversal of the graph, the predetermined condition may indicate whether the vertex has been traversed, and may return a vertex address (vertex number) that has not been traversed as a result.
  • the predetermined condition may indicate whether the vertex has been marked by the level or the first level of the predetermined vertex, and may be returned without being marked.
  • the vertices of the hierarchy are either the vertices of the specified hierarchy or even the addresses (vertex numbers) of the vertices of the specified multiple levels as a result.
  • an operation including a relational operation and/or a logical operation may be performed on the data and the predetermined condition value, and may be true when the operation result indication is true When it is determined that the predetermined condition is satisfied.
  • the relational operations may include, but are not limited to, equal to, greater than, greater than or equal to, less than, less than or equal to, and not equal to
  • logical operations may include, but are not limited to, AND, or and XOR.
  • the embodiments of the present invention are not limited thereto, and those skilled in the art may understand that the operations herein may also include any suitable operations that are existing and future developed.
  • the original value of the data may also be replaced with a new value, and the new value may be a function of a fixed value for the N addresses or an original value of the data at each address.
  • the new value may also be a set of values corresponding to one or more of the N addresses, such a set of values may be set by the user or specified or called internally by the system, such that the predetermined condition can be met Write different values for each address or every few addresses.
  • base_address represents a base address
  • bias_index[i] represents a set of address indexes
  • element_size represents an address element size
  • op represents an operation performed
  • condition_value represents a predetermined condition value
  • new_value represents a new value.
  • the operation is similar to the conventional Compare and Swap (CAS) operation.
  • CAS Compare and Swap
  • embodiments of the present invention can perform such operations on multiple addresses, and these addresses can be non-contiguous.
  • each qualified bias_index[i] is temporarily stored in the buffer, and is not outputted in the buffer until the access of all the addresses ends. The address is the result.
  • “AND”, or “OR” and XOR “NOR” and so on are merely exemplary, and embodiments of the present invention are not limited thereto, and may also include any other combination of appropriate operations and operations developed in the future.
  • the predetermined condition may be not only a fixed predetermined condition value "condition_value”, but also a relational expression such as an expression that may be a predetermined condition value and an original value of the address. and many more.
  • the determination of the predetermined condition is very flexible in determining whether the data stored at the address satisfies the predetermined condition, including various operations, and thus various needs can be satisfied.
  • the predetermined condition is the same for N addresses in the above description, the embodiment of the present invention is not limited thereto, and in some embodiments, the predetermined condition may be different for N addresses.
  • the above-mentioned “op” and “condition_value” may include elements respectively corresponding to each of the N addresses, for example, may be provided in the form of arrays op[i] and condition_value[i], where 0 ⁇ i ⁇ N-1.
  • FIG. 6 is a schematic flow diagram showing a method 600 for multiple accesses in memory when performing predetermined operations on data stored at N addresses, in accordance with yet another embodiment of the present invention.
  • the method 600 can be performed in a memory.
  • N addresses in memory are received, where N is an integer greater than one and the N addresses are non-contiguous.
  • At 620 at least one of an arithmetic operation, a relational operation, and a logical operation is performed on data stored at the N addresses.
  • the result of the operation is output as a result.
  • arithmetic operations such as addition, subtraction, and division may be performed on data stored at N addresses, and the results of sum, difference, and product obtained after the operation may be output, for example,
  • the pseudo code shows the summation:
  • a relational operation can be performed on data stored at N addresses to obtain maximum, minimum, intermediate, and the like in the data.
  • the buffer when performing the above operations on data stored at N addresses, can also be utilized to increase the speed, that is, the intermediate result is temporarily stored in the buffer before the operations on all N addresses are completed. Then, after completing the operation on the N addresses, the result in the output buffer is used as the operation result. This is similar to some of the previous embodiments, and thus a detailed description thereof will be omitted for brevity.
  • FIG. 7 is a schematic block diagram showing an apparatus 700 for supporting multiple accesses in a memory, in accordance with an embodiment of the present invention.
  • a device 700 may also be referred to as a Multi-Random Access Memory with Processing Function (MRAMPF).
  • MMRAMPF Multi-Random Access Memory with Processing Function
  • the apparatus 700 for supporting multiple accesses in the memory may include: receiving a ticket Element 710, processing unit 720, and output unit 730.
  • the receiving unit 710 is configured to receive N addresses in the memory, where N is an integer greater than 1 and the N addresses are non-contiguous.
  • the processing unit 720 is configured to perform a predetermined operation according to the N addresses.
  • the output unit 730 is for outputting the result of the operation.
  • multiple addresses in memory can be operated, and these addresses can be either continuous or non-contiguous, which allows the desired address to be entered and used just as desired by the user.
  • the predetermined operation can be performed inside the memory according to the input address and the result of the operation is output, not only the function of the memory is expanded, but also the speed of data processing is improved, saving time.
  • the buffer may be utilized to store the intermediate result and executed when all the addresses are executed before the predetermined operation performed on all the addresses ends.
  • the address in the output buffer at the end of the predetermined operation is used as a result to further increase the access speed.
  • FIG. 8 is a schematic block diagram showing an apparatus 800 for supporting multiple accesses in a memory, in accordance with another embodiment of the present invention.
  • the apparatus 800 differs from the apparatus 800 shown in FIG. 7 in that it further includes a buffer 825 for storing intermediate results before the processing unit 820 completes operations on all of the N addresses.
  • the receiving unit 810, the processing unit 820, and the output unit 830 included in the apparatus 800 illustrated in FIG. 8 respectively correspond to the receiving unit 710, the processing unit 720, and the output unit 730 illustrated in FIG. 7, have a similar structure and Perform similar functions separately, and the details are not described here.
  • output unit 830 outputs the result of the operation stored in buffer 825. Therefore, according to the embodiment of the present invention, since the intermediate result is temporarily stored in the buffer before the operation of all the addresses is completed, the access speed can be further improved.
  • the buffer 825 herein may be newly partitioned from the original buffer domain in the memory, or may be a new buffer area added to the memory. In the latter case, it may be necessary to improve the hardware of the memory.
  • the data in the buffer can be emptied each time the data in the buffer is output.
  • processing unit 720 or 820 can determine each address by a base address and an offset, where the offset indicates the distance of the address from the base address.
  • multiple offsets may be defined in the form of an array, such as bias[i], where i is an integer and 0 ⁇ i ⁇ N-1.
  • receiving N offsets may further include receiving address element size (4 Bytes) and N address indexes.
  • the determination may be made by the receiving unit 710 or 810 and the determined N addresses are transmitted to the processing unit 720 or 820.
  • all of the N addresses may be transmitted to processing unit 720 or 820 after receiving unit 710 or 810 receives and determines all N addresses.
  • the receiving unit 710 or 810 transmits it to the processing unit 720 or 820 every time an address is determined.
  • the processing unit 720 or 820 performing a predetermined operation according to the N addresses may include performing a predetermined operation on data stored at the N addresses.
  • the processing unit 720 or 820 can access each of the N addresses and determine whether data stored at the address satisfies a predetermined condition, and the output unit 730 or 830 may output one or more of the N addresses that satisfy a predetermined condition as a result.
  • the address of the data satisfying the predetermined condition is stored in a buffer in the memory before completing access to all of the N addresses. After completing access to all of the N addresses, the output unit 730 or 830 can output the address in the buffer as a result.
  • the processing unit 720 or 820 determining whether the data stored at the address satisfies the predetermined condition may include performing an operation including a relational operation and/or a logical operation on the data and the predetermined condition value, and when the operation result indication is true When it is determined that the predetermined condition is satisfied.
  • the relational operations may include, but are not limited to, equal to, greater than, greater than or equal to, less than, less than or equal to, and not equal to
  • logical operations may include, but are not limited to, AND, or and XOR.
  • processing unit 720 or 820 can replace the original value of the data with a new value, which can be a function of a fixed value or an original value.
  • the predetermined condition may be the same or different for N addresses.
  • N may depend on the actual situation, such as user requirements, hardware design, computing power, etc., for example N may be 32, 64, and the like. N can be appropriately selected so as not to affect the processing performance of the memory.
  • the processing unit 720 or 820 may perform at least one of an arithmetic operation, a relational operation, and a logical operation on the data stored at the N addresses, and the output unit 730 or 830 may output an operation
  • the processing unit 720 or 820 may perform at least one of an arithmetic operation, a relational operation, and a logical operation on the data stored at the N addresses, and the output unit 730 or 830 may output an operation
  • the result is the result.
  • base_address represents a base address
  • bias_index[i] represents a set of address indexes, which may be continuous or non-contiguous
  • element_size represents an address element size (for example, 4 Bytes)
  • Function() represents a predetermined operation to be performed
  • parameter represents a predetermined order.
  • the parameters required for the operation may be one or more, and output indicates the address to which the result is to be output.
  • the N addresses that are actually to be operated can be determined based on the following:
  • bias_index[i]*element_size can be replaced by bias[i]
  • op, condition_value and new_value can be A set of elements, and even an expression.
  • the Function may be, for example, Function (op, condition_value, new_value), where op represents The operation performed, condition_value represents a predetermined condition value, and new_value represents a new value.
  • the Function may be, for example, a Function (op), or even a Function (op1, Op2, op3,...), or Function (op1[i], op2[i], op3[i],).
  • embodiments of the present invention also include a storage system including the apparatus 700 or 800 for supporting multiple accesses in the memory as described above with reference to FIG. 7 or 8.
  • FIGS. 7 and 8 Only portions related to the embodiments of the present invention are shown in FIGS. 7 and 8, but those skilled in the art will appreciate that the devices shown in FIGS. 7 and 8 or The device can include other necessary units.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, and may be located in one place. Or it can be distributed to multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

Abstract

提供一种在存储器中进行多访问的方法、支持存储器中的多访问的装置以及存储系统。所述方法包括:接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的(110);根据N个地址来执行预定操作(120);以及输出操作的结果(130)。因此,能够提高计算机系统的性能,并且使得可以恰如用户所需地来输入并使用所期望的地址。

Description

在存储器中进行多访问的方法、装置和存储系统 技术领域
本发明实施例涉及一种在存储器中进行多访问的方法、支持存储器中的多访问的装置以及存储系统,更具体地说,涉及一种能够提高存储器的访问性能的在存储器中进行多访问的方法、支持存储器中的多访问的装置以及存储系统。
背景技术
随机内存访问一直是影响计算机性能的重要因素。一般而言,一次DRAM(Dynamic Random Access Memory,动态随机存取存储器)的访问需要数百个时钟周期。计算机系统结构和编程语言一直采用如Cache(高速缓冲存储器)、预取等方式来尽量减少对DRAM的随机访问,或降低随机访问对性能的影响。
近年来,大数据分析成为重要的应用领域,在大数据分析应用中,大量使用以图为代表的数据结构,Cache和预取等方法很难对这类数据结构的访问进行优化,在处理器和内存结构下仍然会产生大量的随机访问。
因此,期望一种能够提高计算机系统的访问性能的解决方案。
发明内容
本发明实施例提供一种在存储器中进行多访问的方法、支持存储器中的多访问的装置以及存储系统,能够提高计算机系统的访问性能。
根据本发明实施例的一个方面,提供一种用于在存储器中进行多访问的方法,包括:接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的;根据N个地址来执行预定操作;以及输出操作的结果。
根据本发明实施例的另一个方面,提供一种用于支持存储器中的多访问的装置,包括:接收单元,用于接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的;处理单元,用于根据N个地址来执行预定操作;以及输出单元,用于输出操作的结果。
根据本发明实施例的再一方面,提供一种存储系统,包括如前所述的用于支持存储器中的多访问的装置。
因此,根据本发明实施例,可以对存储器中的多个地址进行操作,并且这些地址既可以是连续的,也可以是非连续的,这使得可以恰如用户所需地来输入并使用所期望的地址。此外,由于可以在存储器内部根据输入的地址来执行预定操作并输出操作的结果,所以不仅拓展了存储器的功能,而且提高了数据处理的速度,节省了时间。
附图说明
通过以下借助附图的详细描述,将会更容易地理解本发明,其中相同的标号指定相同结构的单元,并且在其中:
图1是示出根据本发明实施例的用于在存储器中进行多访问的方法的示意性流程图;
图2是示出根据本发明另一实施例的用于在存储器中进行多访问的方法的示意性流程图;
图3是示出一种图的数据结构的示意图;
图4是示出根据本发明一具体实施方式的当对在N个地址处存储的数据执行预定操作时、用于在存储器中进行多访问的方法的示意性流程图;
图5是示出根据本发明另一具体实施方式的当对在N个地址处存储的数据执行预定操作时、用于在存储器中进行多访问的方法的示意性流程图;
图6是示出根据本发明的再一具体实施方式的当对N个地址执行预定操作时、用于在存储器中进行多访问的方法的示意性流程图;
图7是示出根据本发明实施例的一种支持存储器中的多访问的装置的示意性框图;以及
图8是示出根据本发明另一实施例的一种支持存储器中的多访问的装置的示意性框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在这里使用的术语仅仅是为了描述特定实施例的目的,而并不意欲限制 本发明。如这里所使用的,单数形式“一”、“一个”和“该”意欲也包括复数形式,除非上下文明确指出并非如此。还应当明白,当在本说明书中使用时,术语“包括”和/或“包括……的”、或“包含”和/或“包含……的”指定所阐述的特征、区域、整数、步骤、操作、元件和/或组件的存在,但是不排除存在或添加一个或多个其他特征、区域、整数、步骤、操作、元件和/或组件和/或其组。
除非另有定义,否则这里所使用的所有术语具有与本发明所属领域的一名普通技术人员所通常理解的相同的含义。还应当明白,诸如在通用字典中定义的那些术语应当被解释为具有与相关技术和本公开的上下文的其含义一致的含义,而不应当以理想化或过于形式化的意义来对其进行解释,除非这里明确地如此定义。
在以下的描述中,虽然以图为例来进行说明,但是,本领域技术人员应当明白,本发明实施例也可以被应用于其他数据结构,诸如树、链表等。
此外,除了DRAM之外,本发明实施例还可以应用于其他存储设备和存储系统,诸如SRAM(static random access memory,静态随机存取存储器)、PCM(Phase Change Memory,相变存储器)、FRAM(铁电存储器)等等。
此后,将结合附图来具体说明本发明实施例。
图1是示出根据本发明实施例的用于在存储器中进行多访问的方法100的示意性流程图。该方法100可以在处理器中被执行。
如图1中所示,在方法100的110,接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的。
在120,根据N个地址来执行预定操作。
在130,输出操作的结果。
因此,根据本发明实施例,可以对存储器中的多个地址进行操作,并且这些地址既可以是连续的,也可以是非连续的,这使得可以恰如用户所需地来输入并使用所期望的地址。此外,由于可以在存储器内部根据输入的地址来执行预定操作并输出操作的结果,所以不仅拓展了存储器的功能,而且提高了数据处理的速度,节省了时间。
此外,由于一次接收多个地址并执行预定操作,因此,根据本发明另一实施例,在对所有地址执行的预定操作结束之前,可以利用缓冲区来存储中间结果,并当对所有地址执行的预定操作全部结束时输出缓冲区内的地址作 为结果,以此来进一步提高访问速度。
图2是示出根据本发明另一实施例的用于在存储器中进行多访问的方法200的示意性流程图。该方法200可以在存储器中被执行。
如图2中所示,在方法200的210,接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的。
在220,根据N个地址来执行预定操作。
在225,将操作的结果存储在存储器内的缓冲区中。
在230,在方法100中输出操作的结果具体为:输出缓冲区中的地址作为结果。
根据本发明实施例,由于在完成对所有地址的操作之前将中间结果暂时存储在缓冲区中,所以可以进一步提高访问速度。例如,在对存储在地址处的数据进行访问的情况下,从存储器外部对存储器内的一个地址处的数据进行一次访问(输入/输出)时,因为握手信号所占用的时间开销较大,大约占总时间的60%左右,所以所需要花费的总时间大概是几百纳秒(例如,200ns),而当利用缓冲区时,因为大大减少了握手信号,所以在存储器内部对同样的数据进行一次访问所需要花费的总时间可以被缩短到几十纳秒甚至几个纳秒,例如1-2ns。此外,可以以与cache(高速缓冲存储器)类似的结构来配置该缓冲区,以进一步提高访问速度。因此,当利用缓冲区时,可以缩短了访问时间。
这里,缓冲区可以是从存储器内原有的缓冲区域中新划分出来的,也可以是增加到存储器中的新的缓冲区域。在后者的情况下,可能需要对存储器的硬件进行改进。此外,可以在每次输出了结果之后,清空缓冲区中的数据。
下面,将以图的数据结构为例来具体描述本发明实施例,但是本领域技术人可以明白,显然本发明还可以应用于除图以外的其他数据结构,诸如树、链表等等。
图3是示出一种图的数据结构的示意图。虽然在图3中示出的是无向图,但是本领域技术人员可以明白,显然其也可以为有向图,并且还可以包括权重信息。
如图3中所示,该图中包括8个顶点V0、V1、V2、V3、V4、V5、V6和V7以及11条边,通常,利用一维数组来存储顶点数据并且利用二维数组来存储边数据。一个示意性的顶点数组V[j](0≤j≤7)例如为下面所示。
V0 V1 V2 V3 V4 V5 V6 V7
顶点数据可以包括各种信息。在一个例子中,当执行图的遍历时,顶点数据可以表示该顶点是否已被遍历过,例如,0表示未遍历且1表示已遍历。在另一个例子中,在对顶点标层级的应用中,顶点数据可以表示该顶点相对于当前所指定的中心顶点来说是第几级节点。当然,本发明实施例不限于此,本领域技术人员可以明白顶点数据还可以包括任何其他合适的信息。
从图3中可以看出,在图结构的数据结构中,任意两个顶点(数据元素)之间均可相关联,以致当使用存储器对图中的数据进行访问时,访问顶点的顺序是无法确定的,具有很强的随机性,并且难以进行缓存,导致访问速度较慢。例如,在如图3所示的图结构中,如果希望访问顶点V7,则在某次操作中可能按照V2→V7的顺序来访问,在下一次操作中可能按照V3→V7的顺序来访问,而在另一次操作中又可能按照V5→V7的顺序来访问。
在根据本发明的一个示范性实施例中,可以通过基址(base_address)和偏移量来确定每个地址,其中偏移量指示该地址与基址的距离。具体地,可以以数组的形式来定义多个偏移量,例如bias[i],这里,i是整数,并且0<i≤N-1。例如,假设基址为顶点V0的地址(例如0000H)且要对N=4个顶点V0、V2、V4和V7进行操作,并且可以知道顶点V2、V4和V7的偏移量分别是8Bytes、16Bytes和28Bytes,则可以定义N=4个偏移量,即bias[0]=0,bias[1]=8,bias[2]=16和bias[3]=28,从而在110或210中接收存储器中的N个地址可以进一步包括:接收基址和N个偏移量;以及根据第i地址=基址+第i偏移量来确定N个地址中的每一个。因此,可以得到如下地址:
顶点V0的地址、即第0地址=0000H+0Bytes;
顶点V2的地址、即第1地址=0000H+8Bytes;
顶点V4的地址、即第2地址=0000H+16Bytes;和
顶点V7的地址、即第3地址=0000H+28Bytes。
在另一个示范性实施例中,除了直接给出顶点的地址与基址的偏移之外,还可以使用顶点的编号来作为偏移量。在这种情况下,因为通常顶点的地址索引即表示该顶点的编号,所以,可以利用地址索引与地址元素大小的乘积来计算偏移量,从而进一步确定顶点的地址。
具体而言,例如,仍然假设基址为顶点V0的地址(例如0000H)且要 访问N=4个顶点V0、V2、V4和V7,这里,0、2、4和7表示地址索引(顶点的编号),则可以定义一个偏移索引的数组bias_index[i](0<i≤N-1),即bias_index[0]=0,bias_index[1]=2,bias_index[2]=4和bias_index[3]=7。此外,假设用于存储地址的地址元素大小为4字节(4Bytes)。从而,接收N个偏移量可以进一步包括接收地址元素大小(4Bytes)和N个地址索引,并且可以根据第i地址=基址+第i地址索引×地址元素大小来确定N个地址中的每一个。因此,可以得到如下地址:
顶点V0的地址、即第0地址=0000H+0×4Bytes;
顶点V2的地址、即第1地址=0000H+2×4Bytes;
顶点V4的地址、即第2地址=0000H+4×4Bytes;和
顶点V7的地址、即第3地址=0000H+7×4Bytes。
利用地址索引和地址元素大小来确定地址可以便于用户操作,因为在大多数情况下,用户无法知道顶点的确切地址,但是可以知道每个顶点的编号。因此,利用顶点的编号与其地址索引之间的对应性,即可快速方便地确定实际地址。与需要输入实际地址的方案相比,这大大缩短了输入数据的时间,并且这样的方式便于用户查看输入的顶点是否正确,对于用户来说是操作友好的。
此外,可以看出,根据本发明实施例,对要进行操作的顶点是否连续并无要求,因此,可以对任意连续或非连续的顶点进行访问,使得访问更具有针对性。
此外,如本领域技术人员所明白的,术语“非连续”的含义是广义的,不仅包括绝对离散的顶点或地址,例如上面4个顶点V0、V2、V4和V7,而且可以包括部分连续的顶点或地址,例如顶点V0至V4以及顶点V7共5个顶点。
值得注意的是,虽然在上面的例子中以递增的顺序来输入N个地址,但是,因为输入的多个地址可以是非连续的,所以可以按照任何次序来输入这多个地址,而无需一定按照递增或者递减的顺序。
根据本发明示范性实施例,在120或220中根据N个地址来执行预定操作可以包括:对在N个地址处存储的数据执行预定操作。
图4是示出根据本发明一具体实施方式的当对在N个地址处存储的数据执行预定操作时、用于在存储器中进行多访问的方法400的示意性流程图。 该方法400可以在存储器中被执行。
如图4中所示,在方法400的410,接收要访问的存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的。
在420,访问该N个地址中的每一个,并确定存储在该地址处的数据是否满足预定条件。
在430,输出该N个地址中满足预定条件的一个或多个地址作为结果。
在方法400中,所述N个地址可以是非连续的。当然,本领域技术人员可以明白,这N个地址也可以是连续的。
因而,根据本发明具体实施方式,可以对存储器中的多个地址进行访问,并且返回的结果包括符合条件的一个或多个地址,因此,与传统的一次只能访问一个地址的操作相比,大大提高了访问速度,并由此能够提高计算机系统的访问性能。并且,要访问的这些地址既可以是连续的,也可以是非连续的,这使得可以恰如用户所需地来访问所期望的地址。此外,由于可以在存储器内部对各个地址处的数据是否满足条件进行判定,因此节省了用于输入/输出的时间,提高了处理速度。
此外,由于一次对多个地址进行访问,因此,如上所述,根据本发明另一具体实施方式,在对所有地址的访问结束之前,可以利用缓冲区来存储中间结果,并当对所有地址的访问结束时输出缓冲区内的地址作为结果,以此来进一步提高访问速度。
图5是示出根据本发明另一具体实施方式的当对在N个地址处存储的数据执行预定操作时、用于在存储器中进行多访问的方法500的示意性流程图。该方法500可以在存储器中被执行。
如图5中所示,在方法500的510,接收要访问的存储器中的N个地址,其中N为大于1的整数。
在520,访问该N个地址中的每一个,并确定存储在该地址处的数据是否满足预定条件。
在525,在完成对该N个地址中的所有地址的访问之前,将满足预定条件的数据的地址存储在存储器内的缓冲区中。
在530,在530中的输出该N个地址中满足预定条件的一个或多个地址作为结果具体为:输出缓冲区中的地址作为结果。
由此,根据本发明的该具体实施方式,通过配置该缓冲区而进一步提高 了访问速度。因此,当利用缓冲区来对存储器中的多个地址进行访问时,缩短了访问时间。
在接收了要访问的N个地址之后,根据本发明具体实施方式的当对在N个地址处存储的数据执行预定操作时、用于在存储器中进行访问的方法(方法400或方法500)可以对这些地址中的数据进行访问,以确定存储在该地址处的数据是否满足预定条件。
在一个例子中,这里的预定条件可以由用户根据实际需要来任意指定。在一个例子中,在图的遍历的情况下,预定条件可以表示该顶点是否已被遍历过,并且可以返回没有被遍历过的顶点地址(顶点编号)来作为结果。在另一个例子中,在针对某一预定顶点来为其他顶点标层级的情况下,预定条件可以表示该顶点是否已被标过层级或是预定顶点的第几级节点,并且可以返回没有被标过层级的顶点或者是指定层级的顶点、甚至是指定的多个层级的顶点的地址(顶点编号)来作为结果。
因此,当在本发明具体实施方式中确定存储在地址处的数据是否满足预定条件时,可以对数据与预定条件值执行包括关系运算和/或逻辑运算的操作,并且可以当操作结果指示为真时,确定满足预定条件。这里,关系运算可以包括但不限于等于、大于、大于等于、小于、小于等于和不等于,并且逻辑运算可以包括但不限于与、或和异或。当然,本发明实施例不限于此,并且本领域技术人员可以明白,这里的操作还可以包括现有以及将来开发的任何适当操作。
此外,当确定满足预定条件时,还可以用新值来替换该数据的原始值,并且新值可以是对该N个地址来说固定的值或是每个地址处的数据的原始值的函数。可替换地,新值还可以是与N个地址中的一个或多个对应的一组值,这样的一组值可以由用户设置或者根据情况由系统内部指定或调用,使得可以对符合预定条件的每个地址或每几个地址写入不同的值。
例如,针对每个要访问的地址,可以执行如下伪地址所表示的操作:
if(*(base_address+bias_index[i]*element_size)op condition_value==true)
{
*(base_address+bias_index[i]*element_size)=new_value;
push bias_index[i]
}。
这里,base_address表示基址,bias_index[i]表示一组地址索引,element_size表示地址元素大小,op表示所执行的操作,condition_value表示预定条件值,并且new_value表示新值。
这里,当condition_value表示该地址处的数据的原始值且op为“==”(等于)时,该操作类似于传统的比较并更新(Compare And Swap,CAS)操作。但是,与传统的CAS操作不同的是,本发明实施例可以对多个地址执行这样的操作,并且这些地址可以是非连续的。此外,在本发明实施例中,在对多个地址的访问结束之前,每个符合条件的bias_index[i]被暂时存储在缓冲区中,直到所有地址的访问结束之后,才输出缓冲区中的地址作为结果。
可以明白,也可以不对满足条件的地址处的数据进行任何操作,而仅仅返回其地址即可,例如:
if(*(base_address+bias_index[i]*element_size)op condition_value==true)
{
push bias_index[i]
}。
除了表示等于的关系运算之外,上述伪代码中的“op”还可以是其他关系运算,诸如大于“>”、大于等于“>=”、小于“<”、小于等于“<=”和不等于“!=”,并且“op”还可以是逻辑运算,包括与“AND”、或“OR”和异或“NOR”等等。但是,本领域技术人员应当明白,上述关系运算和逻辑运算仅仅是示范性的,并且本发明实施例不限于此,还可以包括其他任何适当运算的组合以及将来开发的运算。
在一个进一步的实施例中,预定条件可以不仅仅是一个固定的预定条件值“condition_value”,还可以是一个关系表达式,诸如可以是预定条件值与该地址的原始值进行运算的表达式,等等。
因此,根据本发明实施例,在确定存储在地址处的数据是否满足预定条件时对预定条件的判定十分灵活,包括多种操作,因而可以满足各种需要。
此外,虽然在上面的描述中预定条件对N个地址来说都是相同的,但是本发明实施例不限于此,在一些实施例中,预定条件对N个地址可以是不同的。换句话说,上述的“op”和“condition_value”可以包括分别与N个地址中的每个对应的元素,例如可以以数组op[i]和condition_value[i]的形式来提供,其中0<i≤N-1。
图6是示出根据本发明的再一具体实施方式的当对在N个地址处存储的数据执行预定操作时、用于在存储器中进行多访问的方法600的示意性流程图。该方法600可以在存储器中被执行。
如图6中所示,在方法600的610,接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的。
在620,对在N个地址处存储的数据执行算术运算、关系运算和逻辑运算中的至少一个。
在630,输出运算结果作为结果。
在一个例子中,可以对在N个地址处存储的数据执行诸如加、减和除中的一个或多个的算术运算,并输出运算之后得到的和、差和积等的结果,例如,下面的伪代码示出了求和的情况:
{sum=0;
for(i=0;i<N;i++)
sum=sum+*(base_address+bias_index[i]*element_size);
output sum
}。
此外,还可以对在N个地址处存储的数据执行关系运算,以得到这些数据中的最大值、最小值、中间值等等。
当然,虽然上述伪代码仅示出对在N个地址处存储的数据执行算术运算、关系运算和逻辑运算中的一个,但是显然本领域技术人员可以明白,还可以对在N个地址处存储的数据执行算术运算、关系运算和逻辑运算中的至少一个。
此外,当对在N个地址处存储的数据执行上述操作时,同样可以利用缓冲区来提高速度,也就是说,在完成对所有N个地址的操作之前,将中间结果暂时存储在缓冲区内,然后当完成对N个地址的操作之后,输出缓冲区内的结果作为运算结果。这与之前的一些实施方式类似,因此为了简洁而省略对其的详细描述。
图7是示出根据本发明实施例的一种支持存储器中的多访问的装置700的示意性框图。这种装置700也可以被称为具有处理功能的多随机存取存储器(Multi-Random Access Memory with Processing Function,MRAMPF)。
如图7中所示,该支持存储器中的多访问的装置700可以包括:接收单 元710、处理单元720和输出单元730。
接收单元710用于接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的。处理单元720用于根据N个地址来执行预定操作。输出单元730用于输出操作的结果。
因此,根据本发明实施例,可以对存储器中的多个地址进行操作,并且这些地址既可以是连续的,也可以是非连续的,这使得可以恰如用户所需地来输入并使用所期望的地址。此外,由于可以在存储器内部根据输入的地址来执行预定操作并输出操作的结果,所以不仅拓展了存储器的功能,而且提高了数据处理的速度,节省了时间。
此外,由于一次接收多个地址并执行预定操作,因此,根据本发明另一实施例,在对所有地址执行的预定操作结束之前,可以利用缓冲区来存储中间结果,并当对所有地址执行的预定操作全部结束时输出缓冲区内的地址作为结果,以此来进一步提高访问速度。
图8是示出根据本发明另一实施例的一种支持存储器中的多访问的装置800的示意性框图。该装置800与图7中示出的装置800的区别在于还包括缓冲区825,用于在处理单元820完成对该N个地址中的所有地址的操作之前,存储中间结果。在图8中示出的装置800所包括的接收单元810、处理单元820和输出单元830分别与在图7中示出的接收单元710、处理单元720和输出单元730对应,具有类似的结构并且分别执行类似的功能,这里对其细节不再赘述。
因此,在一个例子中,输出单元830输出存储在缓冲区825内的操作的结果。因此,根据本发明实施例,由于在对所有地址的操作完成之前将中间结果暂时存储在缓冲区中,所以可以进一步提高访问速度。
如前所述,这里的缓冲区825可以是从存储器内原有的缓冲区域中新划分出来的,也可以是增加到存储器中的新的缓冲区域。在后者的情况下,可能需要对存储器的硬件进行改进。此外,可以在每次输出了缓冲区中的数据之后,清空缓冲区中的数据。
在一个例子中,处理单元720或820可以通过基址和偏移量来确定每个地址,其中偏移量指示该地址与基址的距离。在这种情况下,接收单元710或810可以接收基址和N个偏移量来作为存储器中的N个地址,并且处理单元720或820根据第i地址=基址+第i偏移量来确定N个地址中的每一个, 0<i≤N-1。例如,如上面所描述的,可以以数组的形式来定义多个偏移量,例如bias[i],这里,i是整数,并且0<i≤N-1。
此外,接收单元710或810所接收的N个偏移量还可以包括地址元素大小和N个地址索引,并且处理单元720或820可以根据第i地址=基址+第i地址索引×地址元素大小来确定N个地址中的每一个。例如,当访问的顶点是V0、V2、V4和V7时,0、2、4和7表示地址索引(顶点的编号),可以定义偏移索引的数组bias_index[i](0<i≤N-1),即bias_index[0]=0,bias_index[1]=2,bias_index[2]=4和bias_index[3]=7。同时,在假设用于存储地址的地址元素大小为4字节(4Bytes)的情况下,接收N个偏移量可以进一步包括接收地址元素大小(4Bytes)和N个地址索引。
当然,除了可以由处理单元720或820来确定要访问的N个地址外,也可以由接收单元710或810来进行确定,并将确定后的N个地址传送给处理单元720或820。
在一个例子中,可以是当接收单元710或810接收并确定了全部N个地址之后,将全部N个地址传送给处理单元720或820。可替换地,可以是接收单元710或810每确定一个地址后即将其传送给处理单元720或820。
从上面的描述可以看出,处理单元720或820根据N个地址来执行预定操作可以包括:对在N个地址处存储的数据执行预定操作。
当对在N个地址处存储的数据执行预定操作时,处理单元720或820可以访问该N个地址中的每一个,并确定存储在该地址处的数据是否满足预定条件,并且输出单元730或830可以输出该N个地址中满足预定条件的一个或多个地址作为结果。
在一个例子中,在完成对该N个地址中的所有地址的访问之前,将满足预定条件的数据的地址存储在存储器内的缓冲区中。在完成对该N个地址中的所有地址的访问之后,输出单元730或830可以输出缓冲区中的地址作为结果。
在一个例子中,处理单元720或820确定存储在该地址处的数据是否满足预定条件可以包括:对数据与预定条件值执行包括关系运算和/或逻辑运算的操作,并且当操作结果指示为真时,确定满足预定条件。这里,关系运算可以包括但不限于等于、大于、大于等于、小于、小于等于和不等于,并且逻辑运算可以包括但不限于与、或和异或。
此外,在一个例子中,当确定满足预定条件时,处理单元720或820可以用新值来替换该数据的原始值,其中,新值可以是固定值或原始值的函数。
显然,在根据本发明实施例的装置700或800中,所述预定条件对N个地址可以相同或不同。
值得注意的是,对于多个地址的数目、即N的选择可以取决于实际情况,诸如用户需求、硬件设计、计算能力等,例如N可以为32、64等等。可以适当选择N,以便不会影响存储器的处理性能。
此外,当对N个地址执行预定操作时,处理单元720或820可以对在N个地址处存储的数据执行算术运算、关系运算和逻辑运算中的至少一个,并且输出单元730或830可以输出运算结果作为结果。这些操作的具体细节已在前文进行了描述,因此这里不再赘述。
下面的伪代码示出根据本发明实施例的用于在存储器中执行多访问的一种示范性实现方式:
Figure PCTCN2015078863-appb-000001
这里,base_address表示基址,bias_index[i]表示一组地址索引,其可以是连续的或非连续的,element_size表示地址元素大小(例如4Bytes),Function()表示要执行的预定操作,parameter表示预定操作所需要的参数且可以是一个或多个,并且output指示结果要被输出到的地址。在这种情况下,可以基于如下来确定真正要操作的N个地址:
base_address+bias_index[0]*element_size
base_address+bias_index[1]*element_size
……
base_address+bias_index[N-1]*element_size。
当然,本领域技术人员可以明白,可以利用bias[i]来替换bias_index[i]*element_size,并且op、condition_value和new_value均可以为 一组元素,并且甚至是表达式。
例如,当对在N个地址处存储的数据执行的预定操作是确定存储在该地址处的数据是否满足预定条件时,Function(parameter)例如可以为Function(op,condition_value,new_value),这里op表示所执行的运算,condition_value表示预定条件值,new_value表示新值。
例如,当对在N个地址处存储的数据执行的预定操作是算术运算、关系运算和逻辑运算中的至少一个时,Function(parameter)例如可以为Function(op),甚至可以为Function(op1,op2,op3,…),抑或Function(op1[i],op2[i],op3[i],…)。
因此,本领域技术人员可以明白,可以根据设计需求来任意设置Function(parameter)的功能,而不限于上面所描述的实施例。
此外,本发明实施例还包括一种存储系统,包括上面参照图7或图8所述的支持存储器中的多访问的装置700或800。
虽然以上以图的数据结构为例来进行说明,但是本发明实施例不限于此,本领域技术人员可以明白,本发明实施例还可以应用于其他数据结构,以达到提高随机访问的性能的效果。
应当注意的是,为了清楚和简明,在图7和图8中仅示出了与本发明实施例相关的部分,但是本领域技术人员应当明白,图7和图8中所示出的设备或器件可以包括其他必要的单元。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,既可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
还需要指出的是,在本发明的装置和方法中,显然,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本发明的等效方案。并且,执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行,但是并不需要一定按照时间顺序执行。某些步骤可以并行或彼此独立地执行。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。
本申请要求于2014年5月14日递交的中国专利申请第201410201149.6号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。

Claims (29)

  1. 一种用于在存储器中进行多访问的方法,包括:
    接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的;
    根据N个地址来执行预定操作;以及
    输出操作的结果。
  2. 根据权利要求1所述的方法,其中,在输出操作的结果之前,该方法还包括:
    将操作的结果存储在存储器内的缓冲区中。
  3. 根据权利要求2所述的方法,其中,输出操作的结果包括:
    输出存储在缓冲区内的操作的结果。
  4. 根据权利要求1-3中的任一个所述的方法,其中,通过基址和偏移量来确定每个地址,其中偏移量指示该地址与基址的距离。
  5. 根据权利要求4所述的方法,其中,接收存储器中的N个地址进一步包括:
    接收基址和N个偏移量;以及
    根据第i地址=基址+第i偏移量来确定N个地址中的每一个,0<i≤N-1。
  6. 根据权利要求5所述的方法,其中,接收N个偏移量进一步包括接收地址元素大小和N个地址索引,并且
    根据第i地址=基址+第i偏移量来确定N个地址中的每一个包括根据第i地址=基址+第i地址索引×地址元素大小来确定N个地址中的每一个。
  7. 根据权利要求1-6中的任一个所述的方法,其中,根据N个地址来执行预定操作包括:对在N个地址处存储的数据执行预定操作。
  8. 根据权利要求7所述的方法,其中,当对在N个地址处存储的数据执行预定操作时:
    访问该N个地址中的每一个,并确定存储在该地址处的数据是否满足预定条件;以及
    输出该N个地址中满足预定条件的一个或多个地址作为结果。
  9. 根据权利要求2-8中的任一个所述的方法,其中,将操作的结果存储在存储器内的缓冲区中包括:
    在完成对该N个地址中的所有地址的访问之前,将满足预定条件的数据的地址存储在存储器内的缓冲区中。
  10. 根据权利要求9所述的方法,其中,输出该N个地址中满足预定条件的一个或多个地址作为结果包括:
    输出缓冲区中的地址作为结果。
  11. 根据权利要求8-10中的任一个所述的方法,其中,确定存储在该地址处的数据是否满足预定条件包括:
    对数据与预定条件值执行包括关系运算和/或逻辑运算的操作;以及
    当操作结果指示为真时,确定满足预定条件,
    其中,关系运算包括等于、大于、大于等于、小于、小于等于和不等于,并且逻辑运算包括与、或和异或。
  12. 根据权利要求8-11中的任一个所述的方法,其中,该方法进一步包括:当确定满足预定条件时,用新值来替换该数据的原始值,
    其中,新值是固定值或原始值的函数。
  13. 根据权利要求8-12中的任一个所述的方法,其中,所述预定条件对N个地址相同或不同。
  14. 根据权利要求7-13中的任一个所述的方法,其中,当对在N个地址处存储的数据执行预定操作时:
    对在N个地址处存储的数据执行算术运算、关系运算和逻辑运算中的至少一个;以及
    输出运算结果作为结果。
  15. 一种用于支持存储器中的多访问的装置,包括:
    接收单元,用于接收存储器中的N个地址,其中N为大于1的整数并且所述N个地址是非连续的;
    处理单元,用于根据N个地址来执行预定操作;以及
    输出单元,用于输出操作的结果。
  16. 根据权利要求15所述的装置,其中,该装置还包括:
    缓冲区,用于存储操作的结果。
  17. 根据权利要求16所述的装置,其中,输出单元输出存储在缓冲区内的操作的结果。
  18. 根据权利要求15-17中的任一个所述的装置,其中,通过基址和偏 移量来确定每个地址,其中偏移量指示该地址与基址的距离。
  19. 根据权利要求18所述的装置,其中,接收单元接收基址和N个偏移量,并且
    根据第i地址=基址+第i偏移量来确定N个地址中的每一个,0<i≤N-1。
  20. 根据权利要求19所述的装置,其中,接收单元接收N个偏移量进一步包括接收地址元素大小和N个地址索引,并且
    根据第i地址=基址+第i偏移量来确定N个地址中的每一个包括根据第i地址=基址+第i地址索引×地址元素大小来确定N个地址中的每一个。
  21. 根据权利要求15-20中的任一个所述的装置,其中,处理单元根据N个地址来执行预定操作包括:对在N个地址处存储的数据执行预定操作。
  22. 根据权利要求21所述的装置,其中,当对在N个地址处存储的数据执行预定操作时:
    处理单元访问该N个地址中的每一个,并确定存储在该地址处的数据是否满足预定条件;以及
    输出单元输出该N个地址中满足预定条件的一个或多个地址作为结果。
  23. 根据权利要求17-22中的任一个所述的装置,其中,在完成对该N个地址中的所有地址的访问之前,将满足预定条件的数据的地址存储在存储器内的缓冲区中。
  24. 根据权利要求23所述的装置,其中,输出单元输出缓冲区中的地址作为结果。
  25. 根据权利要求22-24中的任一个所述的装置,其中,处理单元确定存储在该地址处的数据是否满足预定条件包括:
    对数据与预定条件值执行包括关系运算和/或逻辑运算的操作;以及
    当操作结果指示为真时,确定满足预定条件,
    其中,关系运算包括等于、大于、大于等于、小于、小于等于和不等于,并且逻辑运算包括与、或和异或。
  26. 根据权利要求22-25中的任一个所述的装置,其中,当确定满足预定条件时,处理单元用新值来替换该数据的原始值,
    其中,新值是固定值或原始值的函数。
  27. 根据权利要求22-26中的任一个所述的装置,其中,所述预定条件对N个地址相同或不同。
  28. 根据权利要求21-27中的任一个所述的装置,其中,当对在N个地址处存储的数据执行预定操作时:
    处理单元对在N个地址处存储的数据执行算术运算、关系运算和逻辑运算中的至少一个;以及
    输出单元输出运算结果作为结果。
  29. 一种存储系统,包括如权利要求15-28中的任何一个所述的装置。
PCT/CN2015/078863 2014-05-14 2015-05-13 在存储器中进行多访问的方法、装置和存储系统 WO2015172718A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP22213885.1A EP4180972A1 (en) 2014-05-14 2015-05-13 Method and apparatus for multiple accesses in memory and storage system
EP15792922.5A EP3144817A4 (en) 2014-05-14 2015-05-13 Method and apparatus for multiple accesses in memory and storage system
US15/310,984 US10956319B2 (en) 2014-05-14 2015-05-13 Method and apparatus for multiple accesses in memory and storage system, wherein the memory return addresses of vertexes that have not been traversed
JP2017512089A JP6389323B2 (ja) 2014-05-14 2015-05-13 メモリ中にマルチアクセスを行う方法、装置、及びメモリシステム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410201149.6 2014-05-14
CN201410201149.6A CN103942162B (zh) 2014-05-14 2014-05-14 在存储器中进行多访问的方法、装置和存储系统

Publications (1)

Publication Number Publication Date
WO2015172718A1 true WO2015172718A1 (zh) 2015-11-19

Family

ID=51189834

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/078863 WO2015172718A1 (zh) 2014-05-14 2015-05-13 在存储器中进行多访问的方法、装置和存储系统

Country Status (5)

Country Link
US (1) US10956319B2 (zh)
EP (2) EP3144817A4 (zh)
JP (1) JP6389323B2 (zh)
CN (1) CN103942162B (zh)
WO (1) WO2015172718A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942162B (zh) * 2014-05-14 2020-06-09 清华大学 在存储器中进行多访问的方法、装置和存储系统
GB2533568B (en) * 2014-12-19 2021-11-17 Advanced Risc Mach Ltd Atomic instruction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512499A (zh) * 2006-08-31 2009-08-19 高通股份有限公司 相对地址产生
CN102171649A (zh) * 2008-12-22 2011-08-31 英特尔公司 用于用单个命令对多个不连续地址范围的传送进行排队的方法和系统
CN103238133A (zh) * 2010-12-08 2013-08-07 国际商业机器公司 用于多地址矢量载入的矢量收集缓冲器
CN103942162A (zh) * 2014-05-14 2014-07-23 清华大学 在存储器中进行多访问的方法、装置和存储系统

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000067573A (ja) 1998-08-19 2000-03-03 Mitsubishi Electric Corp 演算機能付きメモリ
US20040103086A1 (en) * 2002-11-26 2004-05-27 Bapiraju Vinnakota Data structure traversal instructions for packet processing
US7865701B1 (en) * 2004-09-14 2011-01-04 Azul Systems, Inc. Concurrent atomic execution
JP4300205B2 (ja) * 2005-08-02 2009-07-22 株式会社東芝 情報処理システムおよび情報処理方法
CN101506793B (zh) * 2006-08-23 2012-09-05 陈锦夫 在动态虚拟记忆中运行操作系统
CN100446129C (zh) * 2006-09-07 2008-12-24 华为技术有限公司 一种内存故障测试的方法及系统
US8447962B2 (en) * 2009-12-22 2013-05-21 Intel Corporation Gathering and scattering multiple data elements
JP2010287279A (ja) * 2009-06-11 2010-12-24 Toshiba Corp 不揮発性半導体記憶装置
CA2790009C (en) * 2010-02-18 2017-01-17 Katsumi Inoue Memory having information refinement detection function, information detection method using memory, device including memory, information detection method, method for using memory, and memory address comparison circuit
JP4588114B1 (ja) 2010-02-18 2010-11-24 克己 井上 情報絞り込み検出機能を備えたメモリ、その使用方法、このメモリを含む装置。
US8904153B2 (en) * 2010-09-07 2014-12-02 International Business Machines Corporation Vector loads with multiple vector elements from a same cache line in a scattered load operation
US20120079459A1 (en) * 2010-09-29 2012-03-29 International Business Machines Corporation Tracing multiple threads via breakpoints
US8612676B2 (en) * 2010-12-22 2013-12-17 Intel Corporation Two-level system main memory
US9342453B2 (en) * 2011-09-30 2016-05-17 Intel Corporation Memory channel that supports near memory and far memory access
US8850162B2 (en) * 2012-05-22 2014-09-30 Apple Inc. Macroscalar vector prefetch with streaming access detection
US11074169B2 (en) * 2013-07-03 2021-07-27 Micron Technology, Inc. Programmed memory controlled data movement and timing within a main memory device
US9497206B2 (en) * 2014-04-16 2016-11-15 Cyber-Ark Software Ltd. Anomaly detection in groups of network addresses

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512499A (zh) * 2006-08-31 2009-08-19 高通股份有限公司 相对地址产生
CN102171649A (zh) * 2008-12-22 2011-08-31 英特尔公司 用于用单个命令对多个不连续地址范围的传送进行排队的方法和系统
CN103238133A (zh) * 2010-12-08 2013-08-07 国际商业机器公司 用于多地址矢量载入的矢量收集缓冲器
CN103942162A (zh) * 2014-05-14 2014-07-23 清华大学 在存储器中进行多访问的方法、装置和存储系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3144817A4 *

Also Published As

Publication number Publication date
CN103942162B (zh) 2020-06-09
EP3144817A1 (en) 2017-03-22
US10956319B2 (en) 2021-03-23
JP6389323B2 (ja) 2018-09-12
JP2017519317A (ja) 2017-07-13
EP4180972A1 (en) 2023-05-17
EP3144817A4 (en) 2017-07-26
US20170083236A1 (en) 2017-03-23
CN103942162A (zh) 2014-07-23

Similar Documents

Publication Publication Date Title
US10810179B2 (en) Distributed graph database
Nazareth Conjugate gradient method
JP6356675B2 (ja) 集約/グループ化動作:ハッシュテーブル法のハードウェア実装
US11762828B2 (en) Cuckoo filters and cuckoo hash tables with biasing, compression, and decoupled logical sparsity
EP3401807B1 (en) Synopsis based advanced partition elimination
KR20130060187A (ko) 캐시 및/또는 소켓 감지 멀티-프로세서 코어 너비 우선 순회
US20200167327A1 (en) System and method for self-resizing associative probabilistic hash-based data structures
US9753984B2 (en) Data access using decompression maps
US11500873B2 (en) Methods and systems for searching directory access groups
CN112307062B (zh) 数据库聚合查询方法、装置及系统
US20230102690A1 (en) Near-memory engine for reducing bandwidth utilization in sparse data applications
WO2015172718A1 (zh) 在存储器中进行多访问的方法、装置和存储系统
US11030714B2 (en) Wide key hash table for a graphics processing unit
Knorr et al. Proteus: A self-designing range filter
Chen et al. Efficient graph similarity search in external memory
US10095630B2 (en) Sequential access to page metadata stored in a multi-level page table
CN112486988A (zh) 数据处理方法、装置、设备及存储介质
CN113297266A (zh) 数据处理方法、装置、设备及计算机存储介质
US9213639B2 (en) Division of numerical values based on summations and memory mapping in computing systems
JP2014130492A (ja) インデックスの生成方法及び計算機システム
KR102471553B1 (ko) 컴퓨팅 기기에 의해 수행되는 방법, 장치, 기기 및 컴퓨터 판독가능 저장 매체
Otoo et al. Chunked extendible dense arrays for scientific data storage
US10339066B2 (en) Open-addressing probing barrier
Nimako et al. Chunked extendible dense arrays for scientific data storage
US9223708B2 (en) System, method, and computer program product for utilizing a data pointer table pre-fetcher

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15792922

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
ENP Entry into the national phase

Ref document number: 2017512089

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015792922

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015792922

Country of ref document: EP

Ref document number: 15310984

Country of ref document: US