WO2020073641A1 - 一种面向数据结构的图形处理器数据预取方法及装置 - Google Patents
一种面向数据结构的图形处理器数据预取方法及装置 Download PDFInfo
- Publication number
- WO2020073641A1 WO2020073641A1 PCT/CN2019/084774 CN2019084774W WO2020073641A1 WO 2020073641 A1 WO2020073641 A1 WO 2020073641A1 CN 2019084774 W CN2019084774 W CN 2019084774W WO 2020073641 A1 WO2020073641 A1 WO 2020073641A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- data
- prefetch
- request
- list
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012545 processing Methods 0.000 title claims abstract description 22
- 239000013598 vector Substances 0.000 claims abstract description 252
- 238000012544 monitoring process Methods 0.000 claims description 13
- 238000012795 verification Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 5
- 230000001788 irregular Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 238000007781 pre-processing Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000010845 search algorithm Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3037—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
- G06F12/0895—Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/45—Caching of specific data in cache memory
- G06F2212/455—Image or video data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6022—Using a prefetch buffer or dedicated prefetch cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6024—History based prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
Definitions
- the invention relates to the field of data prefetching of a graphics processor, and in particular to a data structure-oriented data prefetching method and device of a graphics processor.
- Width-first search is the basic algorithm for graph traversal in many graph computing applications.
- the GPU must generate more than one storage access request for each irregular storage access. This greatly affects the memory access efficiency of the GPU, which in turn causes the GPU to effectively accelerate the width-first search.
- the GPU's access to the graph data structure lacks sufficient locality, which in turn causes the GPU's cache access failure rate of some data to be as high as 80%.
- the GPU cannot achieve latency hiding through massive parallelism, and the pipeline has to pause to wait for data.
- the GPU cannot fully utilize its powerful computing power to accelerate the width-first search algorithm.
- Data prefetching is a technology that promises to improve memory access and cache efficiency.
- Typical data prefetchers on GPUs such as step-based data prefetchers, data stream-based prefetchers, and data prefetchers based on global historical fetch information, can effectively reduce regular storage access in applications Delay.
- the typical prediction-based data prefetcher has a significantly higher prefetch error rate than regular storage access.
- Such a high prefetch error rate directly leads to reading too much useless data, which in turn causes serious cache data pollution and waste of memory bandwidth.
- prefetchers based on memory access pattern recognition cannot accurately identify complex and irregular irregular memory access patterns, these types of data prefetchers have little contribution to reducing memory access latency and improving GPU execution efficiency.
- stride-based data prefetcher stride-based data prefetcher
- prefetcher based on data stream
- downstream prefetcher prefetcher
- Data prefetcher based on global historical fetch information
- the data prefetcher based on the step size uses a table to record local memory historical access information.
- This information mainly includes: the program counter value (PC, as the index of the table), the address of the last fetch (used to calculate the step size and the address of the next fetch), and the step size between the last two fetch addresses ( The difference between the last two fetch addresses) and the step valid bit (mark whether the currently recorded step is valid). If the fetch address of the fetch instruction of the same PC has a fixed step size, then the data prefetcher based on the step size will calculate the address of the data to be prefetched based on the step value and the most recently accessed address.
- PC program counter value
- the address of the last fetch used to calculate the step size and the address of the next fetch
- the step size between the last two fetch addresses The difference between the last two fetch addresses
- the step valid bit mark whether the currently recorded step is valid.
- the data stream-based prefetcher usually tracks the direction of access to a certain storage area.
- the prefetcher based on the data stream will continuously read the data into the buffer of the prefetcher in units of cache blocks according to the identified direction.
- the reason why the prefetched data is not stored in the on-chip cache is to prevent the prefetched data from polluting the useful data in the cache.
- the cache will store the pre-fetched cache block in the cache.
- the prefetcher's buffer will be refreshed.
- a data prefetcher based on global historical access information uses a global historical access information cache (GHB) to record address information of all cache invalid accesses in the entire system.
- GHB global historical access information cache
- Each GHB item will store an invalid access address and a pointer, and these pointers will connect the GHB items from the same index in chronological order.
- the data prefetcher based on the global historical access information will also use an index table. This table is used to store the index value of the index GHB item. These values can be the PC value of the instruction or the address value of the memory access failure.
- Corresponding to the index value is a pointer to the GHB item corresponding to the index value, and all the items with the same index value in the GHB can be found through this pointer.
- Data prefetchers based on global historical memory access information can be used with other data prefetching mechanisms, such as step-based data prefetchers and data stream-based prefetchers, so that multiple memory fetch modes can be identified.
- these three typical data prefetchers are all designed based on one or more common rule storage access patterns. But for the irregular storage access mode of width-first search, the prefetch efficiency of these three data prefetchers is very inefficient, or even invalid.
- a data-oriented structure-oriented graphic for breadth-first search which can efficiently prefetch irregular storage access data, has a simple hardware structure, and is transparent to programmer programming Processor data prefetching method and device.
- a data structure-oriented graphics processor data prefetching method the implementation steps include:
- the storage access request for acquiring the data structure of the monitoring processor check map in step 1) includes: the memory access monitor in the memory access instruction unit is responsible for monitoring the general memory access instruction access to the work vector and the memory instruction unit All the memory access request information and read data recorded in the memory access result buffer in the first level cache.
- the data address of the prefetch request for the next item in the work vector is the data address read by the storage access request plus the read The result obtained by taking the data size.
- the detailed steps of generating the prefetch request for the vertex list vector in step 2) include: determining the corresponding line of the prefetch request for the vertex list vector according to the node ID obtained when the prefetch request for the work vector list was last generated; and The address of the next line, if the address of the corresponding line and the next line are in the same cache block, a storage access request is generated to retrieve the address of the corresponding line and the next line at the same time; if the address of the corresponding line and the next line are not the same Two storage access requests are generated in one cache block to retrieve the addresses of the corresponding line and the next line, respectively.
- the detailed steps of generating a prefetch request for the edge vector in step 2) include: the index generating unit generates and prefetches the request for the edge vector according to the start and end of the edge vector in the runtime, and generates the requested The number mainly depends on how many cache blocks are needed to store the data of these edges and how many cache blocks are needed for address alignment.
- the detailed steps of generating a prefetch request for the visited vector in step 2) include: reading the returned result of the prefetch request for the edge vector as the calculated prefetched visited vector data, for each value read Generate the corresponding access request address.
- a data structure-oriented graphics processor data prefetching device includes a data prefetching unit distributed in each processing unit, and the data prefetching unit is respectively connected with a memory access monitor of the memory access instruction unit and a memory access result cache Connected to the primary cache, the data prefetch unit includes:
- the address space classifier is used to select the corresponding data prefetch request generation method according to the type of storage access request of the processor check graph data structure;
- the runtime information table is used to separately record the runtime information of various vectors in each processing unit Warp, the runtime information of the various vectors includes the index of the work vector list, the index of the vertex list vector, and the start of the edge vector list Indexing and terminating indexing;
- the prefetch request generation unit is used to perform different data prefetch request generation methods according to the designation. If the storage access request is an ordinary read access to the work vector list, a prefetch request to the next item of the work vector list is generated; if The storage access request is a prefetch access to the work vector list, a prefetch request to the vertex list vector is generated; if the storage access request is a prefetch access to the vertex list vector, a prefetch request to the edge vector list is generated ; If the storage access request is a prefetch request for the edge list vector, a prefetch request for the visited list vector is generated;
- the prefetch request queue is used to save the generated data prefetch request.
- the address space classifier includes an address space range table and eight address comparators
- the address space range table includes the start of the address space range of the work vector list, vertex list vector, edge list vector, and visited vector list, respectively.
- Eight addresses corresponding to the start address and the end address one by one, one input of each address comparator in the eight address comparators is the accessed address in the information from the memory access instruction unit, and the other input is the address space range Corresponding addresses in the table, and the output terminals of the eight address comparators are respectively connected to the prefetch request generating unit.
- each item of the runtime information table includes five items of information: WID, work vector list index, vertex list vector index, edge list vector start index and end index, where WID is used to record the processing unit Warp ID; the runtime information table also includes a selector for updating the runtime information based on the source of information from the memory access instruction unit, the Warp ID of the processing unit Warp, and the accessed data according to the memory access information
- WID is used to record the processing unit Warp ID
- the runtime information table also includes a selector for updating the runtime information based on the source of information from the memory access instruction unit, the Warp ID of the processing unit Warp, and the accessed data according to the memory access information
- the corresponding entry in the table, the information source is used to distinguish whether the memory access information comes from the memory access monitor or the memory access result cache. If it comes from the memory access monitor, it is determined that the data prefetching required for the traversal of the new node begins.
- the content of the entry corresponding to the WID in the runtime information table is cleared and the accessed data is recorded in the work index; if it comes from the cache of the access result, the accessed data is recorded in the entry of the corresponding WID in the runtime information table.
- the prefetch request generation unit includes a prefetch generation unit selector, a work vector list prefetch request generation unit, a vertex list vector prefetch request generation unit, an edge list vector prefetch request generation unit, and a visited list vector prefetch Request generation unit, the prefetch generation unit selector according to the type of memory access information output by the address space classifier, the source of information in the information from the memory access instruction unit, the runtime information output by the runtime information table, and from the work list Select one of vector prefetch request generation unit, vertex list vector prefetch request generation unit, edge list vector prefetch request generation unit, and visited list vector prefetch request generation unit to generate prefetch request; work list vector prefetch The request generation unit is used to generate a prefetch request for the next item in the work vector list and write it to the prefetch request queue; the vertex list vector prefetch request generation unit is used to generate a prefetch request for the vertex list vector and write the prefetch request Queue; the edge
- the data structure-oriented graphics processor data prefetching method of the present invention has the following advantages:
- the present invention can efficiently prefetch irregularly stored access data for width-first search.
- the data structure-oriented data prefetching mechanism of the present invention adopts an explicit way to obtain the width-first search access graph data structure mode, and uses the information of the node currently being searched to read the data required to search for the next node to the on-chip cache in advance in.
- the present invention has a simple hardware structure implementation, because the data prefetch unit does not need to use complex calculations, so its calculation logic is very simple, the main overhead of data prefetching comes from the storage graph data structure information and runtime information, And this part of the storage can be solved using on-chip shared storage.
- the present invention is transparent to programmer programming.
- the use of the data structure-oriented data prefetching mechanism of the present invention does not require a large number of changes to the original program, but only needs to replace the original application storage space allocation code with the application storage space allocation code marked with the data structure.
- the data structure-oriented graphics processor data prefetching device of the invention is the hardware of the data structure-oriented graphics processor data prefetching method of the invention, and also has the aforementioned advantages of the data structure-oriented graphics processor data prefetching method of the invention. I will not repeat them here.
- FIG. 1 is a schematic diagram of the working principle of the existing step-based data prefetcher.
- FIG. 2 is a schematic diagram of the working principle of the existing data stream-based prefetcher.
- FIG. 3 is a schematic diagram of the working principle of an existing data prefetcher based on global historical access information.
- FIG. 4 is a schematic flowchart of a method according to an embodiment of the present invention.
- FIG. 5 is a schematic diagram of a distributed distribution structure of a data preprocessing unit in an embodiment of the present invention.
- FIG. 6 is a schematic diagram of a basic structure and interface of a data preprocessing unit in an embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of a data preprocessing unit in an embodiment of the present invention.
- the implementation steps of the data pre-fetching method of the data processor for the data structure in this embodiment include:
- the data structure-oriented graphics processor data prefetching method is implemented by the data prefetching unit in the processor, and the monitoring processor checks the storage access request of the graph data structure, and the monitored storage access request information and The read data is sent to the data prefetch unit; after receiving the storage access request information, the data prefetch unit according to whether the storage access request is a data prefetch request and which data in the graph data structure is accessed by the access request Vector to select the corresponding data prefetch request generation unit to select the corresponding data prefetch request generation method.
- the Compressed Sparse Row format (a compressed format used to store large sparse graphs) is used for the data stream-driven width-first search algorithm of the graph data structure, which contains 4 data vectors: work list vector, vertex list vector, edge list Vector and visited list vector.
- the data prefetch unit will generate a prefetch request to the next item in the work vector; if the storage access request is a prefetch access to the work vector , The data prefetch unit will generate a prefetch request for the vertex list vector; if the storage access request is a prefetch access to the vertex list vector, the data prefetch unit will generate a prefetch request for the edge vector; if the The storage access request is a prefetch request for the edge list vector, and the data prefetch unit generates a prefetch request for the visited list vector.
- the storage access request for obtaining the data structure of the monitoring processor check map in step 1) includes: the memory access monitor in the memory access instruction unit is responsible for monitoring the normal memory access read instruction access to the work vector and memory access
- the memory access result buffer in the instruction unit records all memory access request information and read data processed by the primary cache.
- the data address of the prefetch request for the next item in the work vector is the data address read by the storage access request plus The result of the size of the data read.
- the detailed steps for generating the prefetch request for the vertex list vector in step 2) include: determining the correspondence of the prefetch request for the vertex list vector according to the node ID obtained when the prefetch request for the work vector list was last generated The address of the line and the next line, if the address of the corresponding line and the next line are in the same cache block, a storage access request is generated to retrieve the address of the corresponding line and the next line at the same time; if the address of the corresponding line and the next line If not in the same cache block, two storage access requests are generated to retrieve the addresses of the corresponding line and the next line.
- the detailed steps of generating a prefetch request for the edge vector in step 2) include: the index generation unit generates and pre-fetches the request for the edge vector according to the start and end of the edge vector in the runtime.
- the number of requests depends mainly on how many cache blocks are needed to store the data on these edges and how many cache blocks are needed for address alignment.
- the detailed steps for generating the prefetch request for the visited vector in step 2) include: reading the return result of the prefetch request for the edge vector list as the calculated prefetched visited vector data, and reading for each The value of generates the corresponding access request address.
- the data structure-oriented graphics processor data prefetching device of this embodiment includes a data prefetching unit distributed in each processing unit.
- the data prefetching unit is respectively connected with the memory access monitor of the memory access instruction unit, the memory access result cache, and a Level cache is connected.
- a GPU will contain multiple stream processors (SM), and each SM will contain many simple single-threaded processor cores.
- SM stream processors
- every 32 hardware threads form a processing unit for resource scheduling. This processing unit is called Warp. All threads in the same Warp will execute the same instruction at the same time.
- this embodiment uses a distributed data prefetch unit to process and generate the data prefetch request in each stream processor.
- the data prefetch unit includes:
- the address space classifier 1 is used to select a corresponding data prefetch request generation method according to the type of the storage access request of the processor check graph data structure;
- Runtime information table 2 is used to record the runtime information of various vectors in each processing unit Warp.
- the runtime information of various vectors includes the index of the work vector list, the index of the vertex list vector, and the start index of the edge vector list And termination index;
- the prefetch request generating unit 3 is used to perform different data prefetch request generation methods according to the designation, and if the storage access request is an ordinary read access to the work vector list, a prefetch request to the next item of the work vector list is generated; If the storage access request is a prefetch access to the work vector list, a prefetch request to the vertex list vector is generated; if the storage access request is a prefetch access to the vertex list vector, a prefetch to the edge vector list is generated Request; if the storage access request is a prefetch request for the edge list vector, a prefetch request for the visited list vector is generated;
- the prefetch request queue 4 is used to save the generated data prefetch request.
- the access information of the graph data structure obtained by the data prefetch unit mainly comes from the two components of the load instruction unit (Load / Store Unit): memory access monitor (memory access monitor) and memory access results Buffer (responseFIFO).
- the memory access monitor is responsible for monitoring access to the general memory access read command of the work vector. If these memory access and read instructions are monitored, the data prefetch unit knows the start of a new search iteration and starts to prepare for the prefetching of the data required for the next iteration.
- the memory access result buffer is responsible for recording all memory access request information and read data processed by the first-level cache.
- the fetch result buffer can monitor the processing status of the prefetch request and send the requested data and fetching information to the data prefetching unit.
- the data prefetching unit can generate corresponding data prefetching requests by using the information from the memory access instruction unit and the width-first search access mode of the graph data structure. After receiving the information from the fetch instruction unit, the data prefetch unit will update the corresponding item in the runtime information table 2 according to the Warp ID in the message, and the prefetch request generation unit 3 will be based on the source of the information and the fetch Which vector in the graph data structure is requested to select the data prefetch request generator.
- the prefetch unit puts the newly generated data prefetch request into the prefetch request queue 4.
- the prefetch request generating unit 3 in the data prefetch unit is responsible for controlling the number of data prefetch requests generated.
- the first-level cache not only handles ordinary memory access requests, but also processes data prefetch requests and treats it as ordinary memory access requests.
- the address space classifier 1 includes an address space range table and eight address comparators.
- the address space range table includes the start of the address space range of the work vector list, vertex list vector, edge list vector, and visited list vector, respectively.
- the eight addresses corresponding to the start address and the end address one by one, one input of each address comparator in the eight address comparators is the accessed address in the information from the memory access instruction unit, and the other input is the address space range table Corresponding addresses, and the output terminals of the eight address comparators are connected to the prefetch request generating unit 3 respectively.
- the address comparator judges which data vector's address space the address of the memory access request belongs to by comparing the address of the memory access request with the address space range of all data vectors.
- the runtime information table 2 will update the corresponding items according to the Warp ID according to the received information.
- each item in the runtime information table 2 includes five items of information: WID, work vector list index, vertex list vector index, edge list vector start index and end index, among which WID is used to record Processing unit Warp ID; runtime information table 2 also includes a selector, the selector is used to update the runtime information table based on the information source from the memory access instruction unit information, the processing unit Warp Warp ID, and the accessed data
- the information source is used to distinguish whether the memory access information comes from the memory access monitor or the memory access result cache.
- the memory access monitor determines that the data prefetching required for the traversal of the new node begins.
- the content of the entry corresponding to the WID in the runtime information table 2 is cleared and the accessed data is recorded in the work index; if it comes from the cache of the access result, the accessed data is recorded in the entry corresponding to the WID in the runtime information table 2 .
- each item of runtime information table 2 includes five items of information: WID, work vector list index, vertex list vector index, edge list vector start index and end index:
- WID It indicates the Warp to which the recorded information belongs. All the received memory information will carry Warp ID information, and determine which item of information in the update table is compared with the WID in the runtime information table. As shown in Figure 7, 0, 1, and 2 represent Warp0, Warp1, and Warp2, respectively.
- Index of work list vector indicates which node in the work vector list is being prefetched by the prefetch unit. This item is updated by the access monitor to monitor the general access information of the work vector. By determining the position of the node item in the work vector currently visited by the Warp, the position of the node item to be accessed by the Warp next cycle is obtained, that is, the next item of the currently accessed node item. For example, as shown in Figure 7, the item with WID of 0, work index is 2, indicating that Warp0 is traversing the first item in the work list vector, and the data prefetch unit is prefetching the second item in the work list vector The data required for the item.
- the index of the vertex vector indicates the node ID to be traversed in the next cycle of this Warp. This item is updated by the prefetch access information of the work vector list monitored by the access result cache. Based on the accessed address of the prefetch access information and the work list index recorded in the runtime information table, the data address of the prefetch access request can be determined, and then the corresponding data can be read out and the vertex index can be updated.
- the start index and end index of the edge list vector mark the range of all edges of the node to be traversed in the next vector of this Warp loop.
- This item is updated by the prefetch access information of the vertex list vector monitored by the access result cache.
- the data address of the prefetch access request can be determined, and then the corresponding data can be read out. Since the start index and end index of a node's edge vector are the two adjacent items in the vertex list vector, if these two values are stored in the same cache block, it can be obtained by accessing the memory once. Otherwise, it needs to be read separately through two pre-fetches.
- the address corresponding to vertex index 1279 is 0x900232FC
- the address corresponding to the next vertex index 1280 is 0x90023200.
- These two addresses are in two cache blocks respectively, and need to pass two Take to get the start index and end index of the edge list vector.
- the current state indicates that it is waiting for a prefetch request to address 0x90023200 to update the end index of the edge list.
- the selector of the runtime information table 2 updates the runtime information table through three inputs. (1) Since each fetch information carries a Warp ID, the selector determines which entry in the update table by matching with the WID in the runtime information table. (2) The information source indicates whether the memory access information comes from the memory access monitor (indicated by 0) or the memory access result cache (indicated by 1). If it comes from the access monitor, it means that the data prefetching required for the traversal of the new node starts, and the content of the same WID entry in the table will be cleared and the accessed data will be recorded in the work index. If it comes from the cache of the access result, the accessed data will be recorded to the corresponding content in the table.
- the runtime information output will only be obtained when the start index and end index of the edge list are all acquired. They will be sent to the edge vector list prefetch request generation unit 34 to generate a prefetch request for the edge vector list, and the work list index and vertex index can be forwarded to the prefetch request generation unit while updating the runtime information table .
- the prefetch request generator unit 3 includes a prefetch generation unit selector 31 and four prefetch request generators, which are respectively responsible for generating memory access requests for four data vectors.
- the prefetch request generation unit 3 includes a prefetch generation unit selector 31, a work vector list prefetch request generation unit 32, a vertex list vector prefetch request generation unit 33, and an edge list vector prefetch request generation unit 34 And visited vector list prefetch request generating unit 35, prefetch generating unit selector 31 according to the type of memory access information output by the address space classifier 1, the source of information in the information from the memory access instruction unit, the output of the runtime information table 2 Run-time information, and select one from work list vector prefetch request generation unit 32, vertex list vector prefetch request generation unit 33, edge list vector prefetch request generation unit 34, visited list vector prefetch request generation unit 35 To generate a prefetch request; the work vector list prefetch request generation unit 32 is used to generate a prefetch request for the next
- the data prefetch unit adds four prefetch request generation units (work list vector prefetch request generation unit 32, vertex list vector prefetch request generation unit 33, edge The list vector prefetch request generation unit 34 and the visited list vector prefetch request generation unit 35) respectively generate prefetch requests for four data vectors.
- the four prefetch request generation units can be divided into two categories: a work vector list prefetch request generation unit 32 that generates a work vector and a generation unit that generates prefetch requests to the other three data vectors. This is because the source of the information required by the generator to generate the prefetch request is the memory access monitor and the memory buffer.
- the data prefetching unit uses the general access information of the work vector to be monitored by the access monitor to generate a prefetch request for the work vector, and the data prefetch request for other data vectors needs to be buffered by the access result Monitored prefetch request information.
- the data prefetch unit also needs to use the access address of the monitored prefetch request to select whether to generate a prefetch request for the vertex list vector, edge list vector, or visited list vector. According to the data structure access pattern of width-first search, the access order of each data vector can be predicted.
- the data prefetch unit when it receives a prefetch request message for the work vector list, it can generate a prefetch request for the vertex list vector, and when it receives a prefetch request for the vertex list vector, It will generate a prefetch request for the edge list vector. Similarly, if a prefetch access request for the edge vector is received, the data prefetch unit will generate a fetch request for the visited list vector. Therefore, the source of the information received by the data prefetch unit and the access address of the monitored access request determine which prefetch request generating unit is used.
- the work vector list prefetch request generation unit 32 is responsible for generating a prefetch request for the work vector list.
- the address of the prefetch request is the address of the next item of data requested by this general fetch read command. Therefore, the data address of the prefetch request is the result of adding the data size to the data address read by this general memory read instruction. For example, if the ordinary memory access read instruction reads the data at address 0x88021d00, then the prefetch request generation unit of the worklist will generate a prefetch request for the data at address 0x88021d04.
- the vertex list vector records the starting position of each row of non-zero elements of the adjacency matrix of the graph in the edge list vector (each row of the adjacency matrix corresponds to a node in the graph, not zero The element is the edge connecting the node. These edges are continuously stored in the edge vector.) Therefore, to determine the number of edges of a certain node, you need to read the value of the corresponding node and the next node in the vertex list vector at the same time.
- the vertex list vector prefetch request generation unit 33 can obtain the address of the node and the next node in the vertex list vector according to the index. Generally, when these two values are in the same cache block, a storage access request can retrieve both values at the same time. However, if the two values are not in the same cache block, the vertex list vector prefetch request generating unit 33 will generate two memory access requests.
- the prefetch request generation unit of the edge vector will generate a prefetch request for the edge vector.
- the start index and end index of the edge list in the entry with the WID of 2 in the runtime information table are 24003 and 25210, respectively. This means that all the edges connecting the node 2020 are from the edge list list item 24003 to Item 25210 (excluding item 25210). Because the information of each node side is continuously stored in the edge vector, the number of requests generated mainly depends on how many cache blocks are needed to store the data of these edges, and the address misalignment needs to be considered.
- the edge vector lists the edge node ID of the edge, so the visited vector list prefetch request generation unit 35 needs to read the end node ID of the edge vector list prefetch request, and the end node ID is used as the visited vector
- the index can be calculated to read the address of the corresponding visited list vector value. Since these end node IDs are discontinuous and scattered, the visited list vector prefetch request generating unit 35 needs to generate an access request address for the visited list vector for each value in the prefetched cache block.
- the size of the GPU's cache block is usually 128B, then if the data size is 4B, it means that 32 end node IDs are stored in a cache block, then the visited vector list prefetch request generation unit needs to generate 32 for these end node IDs. Access requests.
- the data structure-oriented data prefetching method and device of this embodiment use the width-first search-defined data structure access mode and graph data structure information to generate corresponding data prefetch requests, compared to the existing GPUs.
- Data prefetching mechanism, the data prefetching method and device can more accurately and efficiently prefetch the data required for graph traversal using width-first search, thereby improving the performance of GPU in processing graph calculation problems.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims (10)
- 一种面向数据结构的图形处理器数据预取方法,其特征在于,实施步骤包括:1)获取监控处理器核对图数据结构的存储访问请求的信息及读取的数据;2)根据该存储访问请求的类型选择相应的数据预取请求生成方式:若该存储访问请求是对work list向量的普通读访问,则生成对work list向量下一项的预取请求;若该存储访问请求是对work list向量的预取访问,则生成对vertex list向量的预取请求;若该存储访问请求是对vertex list向量的预取访问,则生成对edge list向量的预取请求;若该存储访问请求是对edge list向量的预取请求,则生成对visited list向量的预取请求;3)将生成的预取请求存入预取请求队列。
- 根据权利要求1所述的面向数据结构的图形处理器数据预取方法,其特征在于,步骤1)中获取监控处理器核对图数据结构的存储访问请求包括:访存指令单元中的访存监视器负责监视对work list向量的普通访存读指令访问,以及访存指令单元中的访存结果缓冲记录的所有被一级缓存处理过的访存请求信息以及读取到的数据。
- 根据权利要求1所述的面向数据结构的图形处理器数据预取方法,其特征在于,步骤2)中生成对work list向量下一项的预取请求时,对work list向量下一项的预取请求的数据地址为该存储访问请求所读取的数据地址加上所读取的数据大小得到的结果。
- 根据权利要求1所述的面向数据结构的图形处理器数据预取方法,其特征在于,步骤2)中生成对vertex list向量的预取请求的详细步骤包括:根据上一次产生work list向量的预取请求时得到的节点ID来确定vertex list向量的预取请求的对应行以及下一行的地址,如果该对应行以及下一行的地址在同一个缓存块中则生成一条存储访问请求以同时取回该对应行以及下一行的地址;如果该对应行以及下一行的地址不在同一个缓存块中则生成两条存储访问请求以分别取回该对应行和下一行的地址。
- 根据权利要求1所述的面向数据结构的图形处理器数据预取方法,其特征在于,步骤2)中生成对edge list向量的预取请求的详细步骤包括:根据运行时的edge list向量起始和终止索引生成单元会产生对edge list向量的预取请求,且产生请求的数量主要取决于存储这些边的数据需要多少缓存块以及进行地址对齐需要多少缓存块。
- 根据权利要求1所述的面向数据结构的图形处理器数据预取方法,其特征在于,步骤2)中生成对visited list向量的预取请求的详细步骤包括:读取对edge list向量预取请求的返回结果作为计算预取visited list向量数据,为每一个读取到的值产生相应的访问请求地址。
- 一种面向数据结构的图形处理器数据预取装置,其特征在于,包括分布在每一个处 理单元中的数据预取单元,所述数据预取单元分别与访存指令单元的访存监视器、访存结果缓存以及一级缓存相连,所述数据预取单元包括:地址空间分类器(1),用于根据处理器核对图数据结构的存储访问请求的类型选择相应的数据预取请求生成方式;运行时信息表(2),用于分别记录各个处理单元Warp中各种向量的运行时信息,所述各种向量的运行时信息包括work list向量的索引、vertex list向量的索引、edge list向量的起始索引和终止索引;预取请求生成单元(3),用于根据指定执行不同的数据预取请求生成方式,若该存储访问请求是对work list向量的普通读访问,则生成对work list向量下一项的预取请求;若该存储访问请求是对work list向量的预取访问,则生成对vertex list向量的预取请求;若该存储访问请求是对vertex list向量的预取访问,则生成对edge list向量的预取请求;若该存储访问请求是对edge list向量的预取请求,则生成对visited list向量的预取请求;预取请求队列(4),用于保存生成的数据预取请求。
- 根据权利要求7所述的面向数据结构的图形处理器数据预取装置,其特征在于,所述地址空间分类器(1)包括地址空间范围表和八个地址比较器,所述地址空间范围表分别包括work list向量、vertex list向量、edge list向量、visited list向量的地址空间范围的起始地址、结束地址一一对应的八个地址,所述八个地址比较器中每一个地址比较器的一路输入为来自访存指令单元的信息中的被访问地址、另一路输入为地址空间范围表中对应的地址,且所述八个地址比较器的输出端分别与预取请求生成单元(3)相连。
- 根据权利要求7所述的面向数据结构的图形处理器数据预取装置,其特征在于,所述运行时信息表(2)的每一项包括WID、work list向量的索引、vertex list向量的索引、edge list向量的起始索引和终止索引共五项信息,其中WID用于记录处理单元Warp ID;所述运行时信息表(2)还包括选择器,所述选择器用于根据访存信息来自访存指令单元的信息中的信息来源、处理单元Warp的Warp ID、被访问数据来更新运行时信息表(2)中的对应表项,信息来源用于区别该访存信息来自于访存监视器还是访存结果缓存,若来自访存监视器则判定对新节点的遍历所需的数据预取开始,将运行时信息表(2)中对应WID的条目的内容清空并将被访问的数据记录到work list索引;若来自访存结果缓存,则将被访问的数据记录到运行时信息表(2)中对应WID的条目。
- 根据权利要求7所述的面向数据结构的图形处理器数据预取装置,其特征在于,所述预取请求生成单元(3)包括预取生成单元选择器(31)、work list向量预取请求生成单元(32)、vertex list向量预取请求生成单元(33)、edge list向量预取请求生成单元(34) 以及visited list向量预取请求生成单元(35),所述预取生成单元选择器(31)根据地址空间分类器(1)输出的访存信息类型、来自访存指令单元的信息中的信息来源、运行时信息表(2)输出的运行时信息,并从work list向量预取请求生成单元(32)、vertex list向量预取请求生成单元(33)、edge list向量预取请求生成单元(34)、visited list向量预取请求生成单元(35)四者中选择一个来进行预取请求生成;work list向量预取请求生成单元(32)用于生成对work list向量下一项的预取请求并写入预取请求队列(4);vertex list向量预取请求生成单元(33)用于生成对vertex list向量的预取请求并写入预取请求队列(4);edge list向量预取请求生成单元(34)用于生成对edge list向量的预取请求并写入预取请求队列(4);visited list向量预取请求生成单元(35)用于生成对visited list向量的预取请求并写入预取请求队列(4)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/960,894 US11520589B2 (en) | 2018-10-11 | 2019-04-28 | Data structure-aware prefetching method and device on graphics processing unit |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811183490.8A CN109461113B (zh) | 2018-10-11 | 2018-10-11 | 一种面向数据结构的图形处理器数据预取方法及装置 |
CN201811183490.8 | 2018-10-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020073641A1 true WO2020073641A1 (zh) | 2020-04-16 |
Family
ID=65607513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/084774 WO2020073641A1 (zh) | 2018-10-11 | 2019-04-28 | 一种面向数据结构的图形处理器数据预取方法及装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11520589B2 (zh) |
CN (1) | CN109461113B (zh) |
WO (1) | WO2020073641A1 (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109461113B (zh) | 2018-10-11 | 2021-07-16 | 中国人民解放军国防科技大学 | 一种面向数据结构的图形处理器数据预取方法及装置 |
CN111124675B (zh) * | 2019-12-11 | 2023-06-20 | 华中科技大学 | 一种面向图计算的异构存内计算设备及其运行方法 |
CN113741567B (zh) * | 2021-11-08 | 2022-03-29 | 广东省新一代通信与网络创新研究院 | 矢量加速器及其控制方法、装置 |
CN114218132B (zh) * | 2021-12-14 | 2023-03-24 | 海光信息技术股份有限公司 | 信息预取方法、处理器、电子设备 |
CN114565503B (zh) * | 2022-05-03 | 2022-07-12 | 沐曦科技(北京)有限公司 | Gpu指令数据管理的方法、装置、设备及存储介质 |
CN116821008B (zh) * | 2023-08-28 | 2023-12-26 | 英特尔(中国)研究中心有限公司 | 具有提高的高速缓存命中率的处理装置及其高速缓存设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899156A (zh) * | 2015-05-07 | 2015-09-09 | 中国科学院信息工程研究所 | 一种面向大规模社交网络的图数据存储及查询方法 |
CN104952032A (zh) * | 2015-06-19 | 2015-09-30 | 清华大学 | 图的处理方法、装置以及栅格化表示及存储方法 |
US20170060958A1 (en) * | 2015-08-27 | 2017-03-02 | Oracle International Corporation | Fast processing of path-finding queries in large graph databases |
CN109461113A (zh) * | 2018-10-11 | 2019-03-12 | 中国人民解放军国防科技大学 | 一种面向数据结构的图形处理器数据预取方法及装置 |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8089486B2 (en) * | 2005-03-21 | 2012-01-03 | Qualcomm Incorporated | Tiled prefetched and cached depth buffer |
KR100703709B1 (ko) * | 2005-06-02 | 2007-04-06 | 삼성전자주식회사 | 그래픽스 처리장치와 처리방법, 및 그 기록 매체 |
CN100481028C (zh) * | 2007-08-20 | 2009-04-22 | 杭州华三通信技术有限公司 | 一种利用缓存实现数据存储的方法和装置 |
US8397049B2 (en) * | 2009-07-13 | 2013-03-12 | Apple Inc. | TLB prefetching |
US20140184630A1 (en) * | 2012-12-27 | 2014-07-03 | Scott A. Krig | Optimizing image memory access |
CN104156264B (zh) * | 2014-08-01 | 2017-10-10 | 西北工业大学 | 一种基于多gpu的基带信号处理任务并行实时调度方法 |
US9535842B2 (en) * | 2014-08-28 | 2017-01-03 | Oracle International Corporation | System and method for performing message driven prefetching at the network interface |
US10180803B2 (en) * | 2015-07-28 | 2019-01-15 | Futurewei Technologies, Inc. | Intelligent memory architecture for increased efficiency |
US9990690B2 (en) * | 2015-09-21 | 2018-06-05 | Qualcomm Incorporated | Efficient display processing with pre-fetching |
US20170091103A1 (en) * | 2015-09-25 | 2017-03-30 | Mikhail Smelyanskiy | Instruction and Logic for Indirect Accesses |
US10423411B2 (en) * | 2015-09-26 | 2019-09-24 | Intel Corporation | Data element comparison processors, methods, systems, and instructions |
US20180189675A1 (en) * | 2016-12-31 | 2018-07-05 | Intel Corporation | Hardware accelerator architecture and template for web-scale k-means clustering |
CN111124675B (zh) * | 2019-12-11 | 2023-06-20 | 华中科技大学 | 一种面向图计算的异构存内计算设备及其运行方法 |
-
2018
- 2018-10-11 CN CN201811183490.8A patent/CN109461113B/zh active Active
-
2019
- 2019-04-28 US US16/960,894 patent/US11520589B2/en active Active
- 2019-04-28 WO PCT/CN2019/084774 patent/WO2020073641A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899156A (zh) * | 2015-05-07 | 2015-09-09 | 中国科学院信息工程研究所 | 一种面向大规模社交网络的图数据存储及查询方法 |
CN104952032A (zh) * | 2015-06-19 | 2015-09-30 | 清华大学 | 图的处理方法、装置以及栅格化表示及存储方法 |
US20170060958A1 (en) * | 2015-08-27 | 2017-03-02 | Oracle International Corporation | Fast processing of path-finding queries in large graph databases |
CN109461113A (zh) * | 2018-10-11 | 2019-03-12 | 中国人民解放军国防科技大学 | 一种面向数据结构的图形处理器数据预取方法及装置 |
Non-Patent Citations (1)
Title |
---|
GUO, HUI ET AL.: "Accelerating BFS via Data Structure-Aware Prefetching on GPU", 16 October 2018 (2018-10-16), pages 60234 - 60246, XP011698514, ISSN: 2169-3536 * |
Also Published As
Publication number | Publication date |
---|---|
CN109461113B (zh) | 2021-07-16 |
CN109461113A (zh) | 2019-03-12 |
US11520589B2 (en) | 2022-12-06 |
US20200364053A1 (en) | 2020-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020073641A1 (zh) | 一种面向数据结构的图形处理器数据预取方法及装置 | |
Falsafi et al. | A primer on hardware prefetching | |
San Miguel et al. | Load value approximation | |
EP2542973B1 (en) | Gpu support for garbage collection | |
US8583874B2 (en) | Method and apparatus for caching prefetched data | |
US9361233B2 (en) | Method and apparatus for shared line unified cache | |
US20080065809A1 (en) | Optimized software cache lookup for simd architectures | |
US20090138680A1 (en) | Vector atomic memory operations | |
US6711651B1 (en) | Method and apparatus for history-based movement of shared-data in coherent cache memories of a multiprocessor system using push prefetching | |
US10318261B2 (en) | Execution of complex recursive algorithms | |
Lee et al. | Improving energy efficiency of GPUs through data compression and compressed execution | |
TW201621671A (zh) | 在多記憶體存取代理器動態更新硬體預取特性為互斥或共享的裝置與方法 | |
US6810472B2 (en) | Page handling efficiency in a multithreaded processor | |
Kim et al. | Leveraging cache coherence in active memory systems | |
US20030088636A1 (en) | Multiprocessor system having distributed shared memory and instruction scheduling method used in the same system | |
Beard et al. | Eliminating Dark Bandwidth: a data-centric view of scalable, efficient performance, post-Moore | |
Yoon et al. | Design of DRAM-NAND flash hybrid main memory and Q-learning-based prefetching method | |
Sun et al. | Server-based data push architecture for multi-processor environments | |
Rau et al. | The effect of instruction fetch strategies upon the performance of pipelined instruction units | |
Guo et al. | Accelerating BFS via data structure-aware prefetching on GPU | |
CN114661626A (zh) | 用于选择性地丢弃软件预取指令的设备、系统和方法 | |
Keshtegar et al. | Cluster‐based approach for improving graphics processing unit performance by inter streaming multiprocessors locality | |
Zhang et al. | Locality protected dynamic cache allocation scheme on GPUs | |
Wang et al. | Using idle workstations to implement predictive prefetching | |
CN110347487B (zh) | 一种面向数据库应用的数据搬移的能耗特征化方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19871112 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19871112 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.11.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19871112 Country of ref document: EP Kind code of ref document: A1 |