CN111581441A - Accelerator for cluster computation - Google Patents

Accelerator for cluster computation Download PDF

Info

Publication number
CN111581441A
CN111581441A CN201910874345.2A CN201910874345A CN111581441A CN 111581441 A CN111581441 A CN 111581441A CN 201910874345 A CN201910874345 A CN 201910874345A CN 111581441 A CN111581441 A CN 111581441A
Authority
CN
China
Prior art keywords
request
module
key
index
values
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910874345.2A
Other languages
Chinese (zh)
Other versions
CN111581441B (en
Inventor
徐晓画
孙唐
谈笑
周鹏飞
何振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yixin Industry Co Ltd
Original Assignee
Shanghai Yixin Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yixin Industry Co Ltd filed Critical Shanghai Yixin Industry Co Ltd
Priority to CN202210095018.9A priority Critical patent/CN114461861A/en
Publication of CN111581441A publication Critical patent/CN111581441A/en
Application granted granted Critical
Publication of CN111581441B publication Critical patent/CN111581441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The application provides an accelerator for cluster computation, which comprises an arbitration module, a data carrying module and a distance computation module, wherein the arbitration module sends a second request to the data carrying module according to a received first request, when the first request carries a key value, the data carrying module carries the key value and X index values to the distance computation module from a DDR according to the second request, and X is an integer greater than or equal to 2; and the distance calculation module performs distance calculation on the key value according to the X index values to acquire the first index value. Compared with the key values and all data which are directly calculated, the accelerator has the advantages that the calculation times are large, the accelerator classifies all the data through the index values, the index values are obtained through the key values, and then the key value classification is obtained, so that the calculation times can be reduced, and the technical problem that the speed of searching the data by an information processing system in the prior art is low is solved.

Description

Accelerator for cluster computation
Technical Field
The present application relates to memory technology, and in particular, to an accelerator for cluster computation, and to performing read (Get) and Put (write) operations using the accelerator.
Background
FIG. 1A illustrates a block diagram of a solid-state storage device. The solid-state storage device 102 is coupled to a host for providing storage capabilities to the host. The host and the solid-state storage device 102 may be coupled by various methods, including but not limited to, connecting the host and the solid-state storage device 102 by, for example, SATA (Serial Advanced Technology Attachment), SCSI (Small Computer System Interface), SAS (Serial attached SCSI), IDE (Integrated Drive Electronics), USB (Universal Serial Bus), PCIE (Peripheral Component interconnect Express), NVMe (NVM Express, high-speed nonvolatile storage), ethernet, fibre channel, wireless communication network, etc. A host is an information processing device, such as a personal computer, tablet, server, laptop, network switch, router, cellular telephone, personal digital assistant, etc., capable of communicating with a storage device in the manner described above. The Memory device 102 includes an interface 103, a control section 104, one or more NVM chips 105, and a DRAM (Dynamic Random Access Memory) 110.
NAND flash Memory, phase change Memory, FeRAM (Ferroelectric RAM), MRAM (magnetoresistive Memory), RRAM (Resistive Random Access Memory), XPoint Memory, and the like are common NVM.
The interface 103 may be adapted to exchange data with a host by means such as SATA, IDE, USB, PCIE, NVMe, SAS, ethernet, fibre channel, etc.
The control unit 104 is used to control data transfer between the interface 103, the NVM chip 105, and the DRAM 110, and also used for memory management, host logical address to flash physical address mapping, erase leveling, bad block management, and the like. The control component 104 can be implemented in various manners of software, hardware, firmware, or a combination thereof, for example, the control component 104 can be in the form of an FPGA (Field-programmable gate array), an ASIC (Application-specific integrated Circuit), or a combination thereof. The control component 104 may also include a processor or controller in which software is executed to manipulate the hardware of the control component 104 to process IO (Input/Output) commands. The control component 104 may also be coupled to the DRAM 110 and may access data of the DRAM 110. FTL tables and/or cached IO command data may be stored in the DRAM.
Control section 104 includes a flash interface controller (or referred to as a media interface controller, a flash channel controller) that is coupled to NVM chip 105 and issues commands to NVM chip 105 in a manner that conforms to an interface protocol of NVM chip 105 to operate NVM chip 105 and receive command execution results output from NVM chip 105. Known NVM chip interface protocols include "Toggle", "ONFI", etc.
In the storage device, mapping information from logical addresses to physical addresses is maintained by using a Flash Translation Layer (FTL). The logical addresses constitute the storage space of the solid-state storage device as perceived by upper-level software, such as an operating system. The physical address is an address for accessing a physical memory module of the solid-state memory device. Address mapping may also be implemented using an intermediate address modality in the related art. E.g. mapping the logical address to an intermediate address, which in turn is further mapped to a physical address. In these cases, the read/write commands received by the storage device indicate logical addresses.
A table structure storing mapping information from logical addresses to physical addresses is called an FTL table. FTL tables are important metadata in solid state storage devices. Typically, entries of the FTL table record address mapping relationships in units of data pages in the storage device.
The FTL of some memory devices is provided by a host to which the memory device is coupled, the FTL table is stored by a memory of the host, and the FTL is provided by a CPU of the host executing software. Still other storage management devices disposed between hosts and storage devices provide FTLs. In these cases, the read/write commands received by the storage device indicate physical addresses.
A storage device supporting a Key-Value (Key-data, also referred to as "KV") storage model provides Key (Key) -based read (get (Key)) and write (Put (Value)). To perform a write operation, the host provides a Key (Key) and data (Value) to the storage device to write the data to the storage device, and the Key is used as an index to the written data. To perform a read operation, the host provides a key to the storage device, and the storage device finds data based on the key and provides the data to the host. Thus in a KV storage system, the key is the index used to access the data, and the data (Value) is the data being accessed. In general, the length of the key and the data is not fixed. And optionally, to reduce complexity, the length of the keys and/or data may have a specified range.
KV storage devices or distributed storage systems utilizing KV storage devices are provided in chinese patent applications having application numbers 201711392529.2, 201711474660.3, 201810332295.0, 201810286955.6, and 201810286986.1.
Fig. 1B shows a schematic diagram of an address translation system for a KV memory device of the prior art. The Address translation system (also referred to as FTL table) of the KV storage device provides a mapping from a key (K, also referred to as key) to a logical Address or a Physical Address (e.g., PPA, Physical Page Address). And responding to the acquired key word (K), and using the key word (K) as an index to query the FTL table to obtain a corresponding logical address or physical address.
Since the information processing system stores a large amount of data, a large amount of calculation is required to query the FTL table according to the keyword each time, resulting in a slow data search speed.
Disclosure of Invention
The embodiment of the application provides the accelerator for clustering calculation, so as to solve the technical problem that the data searching speed of an information processing system in the prior art is slow.
According to a first aspect of the present application, a first accelerator for cluster computation according to the first aspect of the present application is provided, including an arbitration module, a data transfer module, and a distance calculation module, where the arbitration module sends a second request to the data transfer module according to a received first request, where the first request carries a keyword, or the first request carries the keyword and a first index value corresponding to the keyword, the first request is a read request or a write request, and the second request is a data transfer request; when the first request carries the keyword, the data carrying module carries the keyword and X index values from the DDR to the distance calculation module according to the second request, wherein X is an integer greater than or equal to 2; and the distance calculation module performs distance calculation on the keyword according to the X index values to obtain the first index value, wherein the first index value is the index value with the minimum distance from the keyword in the X index values.
The first accelerator for cluster computation according to the first aspect of the present application provides a second accelerator for cluster computation according to the first aspect of the present application, where when the first request carries the keyword and the first index value, the data transfer module transfers the keyword and N feature values corresponding to the first index value from the DDR to the distance computation module according to the second request; the distance calculation module performs distance calculation on the keyword according to the N characteristic values to acquire M first characteristic values with the minimum distance from the keyword from the N characteristic values, wherein N and M are integers which are greater than or equal to 1, and M is smaller than or equal to N.
According to the first or second accelerators for cluster computation of the first aspect of the present application, there is provided the third accelerator for cluster computation of the first aspect of the present application, wherein the accelerator further includes a normalization module, and before the distance computation module receives the keyword and the X index values, the normalization module performs a normalization operation on the keyword and the X index values, and sends the normalized keyword and the X index values to the distance computation module.
The second or third accelerator for cluster calculation according to the first aspect of the present application provides a fourth accelerator for cluster calculation according to the first aspect of the present application, wherein when the first request carries the keyword and the first index value, the distance calculation module calculates a distance between each of the N feature values and the keyword to obtain N first distances; the distance calculation module sorts the N first distances from small to large according to values to obtain a sorted distance queue; the distance calculation module selects the M first characteristic values from the distance queue.
According to one of the accelerators for cluster calculation of the second to fourth aspects of the present application, there is provided a fifth accelerator for cluster calculation according to the first aspect of the present application, wherein M is equal to or less than 256.
The fourth accelerator for cluster computation according to the first aspect of the present application provides the sixth accelerator for cluster computation according to the first aspect of the present application, wherein the distance calculation module sorts the N first distances in an insertion sorting manner, and a queue length of the distance queue maintains 256 entries in the sorting process.
The seventh accelerator for cluster calculation according to the first aspect of the present application is provided according to one of the first to sixth accelerators for cluster calculation according to the first aspect of the present application, wherein when the distance calculation module performs distance calculation on the keyword, the distance calculation module calculates a distance between each index value of the X index values and the keyword to obtain X second distances; the distance calculation module compares the magnitudes of any two second distances among the X second distances, and determines the index value of the smallest second distance among the X index values as the first index value.
According to one of the first to seventh accelerators for cluster computation of the first aspect of the present application, there is provided the eighth accelerator for cluster computation of the first aspect of the present application, wherein the distance computation module includes Y computation units, the keyword includes Z sets of values, Y and Z are integers greater than or equal to 1, and Z is less than or equal to Y, the distance computation module utilizes the Y computation units to perform distance computation on the Z sets of values at the same time according to the X index values, and obtains an index value corresponding to each value in the Z sets of values.
According to one of the accelerators for cluster calculation of the first to eighth aspects of the present application, there is provided the accelerator for cluster calculation of the ninth aspect of the present application, wherein the accelerator further comprises a result processing module, and after the distance calculation module obtains a calculation result, the result processing module sends the calculation result to the DDR.
According to one of the accelerators for cluster computation of the first to ninth aspects of the present application, there is provided the accelerator for cluster computation of the tenth aspect of the present application, wherein the X index values are preconfigured.
According to one of the accelerators for cluster computation of the first aspect of the present application, there is provided the accelerator for cluster computation of the eleventh aspect of the present application, wherein the arbitration module provides a plurality of channels to the firmware, and the arbitration module receives the requests sent by the firmware to the plurality of channels by using a round-robin arbitration manner.
An eleventh accelerator for cluster computing according to the first aspect of the present application provides the twelfth accelerator for cluster computing according to the first aspect of the present application, wherein the arbitration module processes requests sent by the firmware to the plurality of channels by using an arbitration manner of weighted round robin scheduling.
According to one of the accelerators for cluster computation of the first aspect of the present application, there is provided the accelerator for cluster computation of the thirteenth aspect of the present application, wherein the first request further indicates a channel used by the accelerator to process the first request, and the arbitration module instructs at least one computation unit of the distance computation module to process the first request according to the channel indicated by the first request.
According to a twelfth accelerator for cluster computing in accordance with the first aspect of the present application, there is provided the fourteenth accelerator for cluster computing in accordance with the first aspect of the present application, wherein the arbitration module allocates a computing unit identifier to each first message in the plurality of first messages, the computing unit identifier is used to instruct a first computing unit in the distance computing module to process a first message carrying the first computing unit identifier, and the first computing unit identifier corresponds to the first computing unit.
One of the accelerators for cluster computation according to a fourteenth aspect of the present application provides the accelerator for cluster computation according to the fifteenth aspect of the present application, wherein the arbitration module instructs the distance computation module to process the first message waiting in the plurality of channels after the distance computation module completes the processing of the plurality of first messages.
According to a fourteenth or fifteenth accelerator for cluster calculation of the first aspect of the present application, there is provided the accelerator for cluster calculation of the sixteenth aspect of the present application, wherein if the distance calculation module does not process the plurality of first messages, the arbitration module does not instruct the distance calculation module to process the messages in the plurality of channels even if the number of the plurality of first messages is smaller than the number of calculation units in the distance calculation module and there is a free calculation unit in the distance calculation module.
According to one of the accelerators for cluster computation of the first aspect of the present application, there is provided the accelerator for cluster computation of the seventeenth aspect of the present application, wherein the arbitration module aggregates the plurality of first messages and instructs the at least one computation unit of the distance computation module to process the aggregated plurality of first messages in parallel.
According to one of the first to seventeenth accelerators for cluster calculation in the first aspect of the present application, an eighteenth accelerator for cluster calculation in the first aspect of the present application is provided, wherein the distance calculation module further includes Y management units, each management unit is coupled to one of the Y calculation units, the arbitration module assigns a calculation unit identifier to the Z group of values when sending the second request, each management unit obtains a value, which is the same as the calculation unit identifier, from the Z group of values, and each management unit sends the obtained value to the calculation unit coupled to itself.
According to a second aspect of the present application, there is provided a first information processing system according to the second aspect of the present application, including a host, an accelerator, a mapping manager, a processor, and a solid state disk, where the host generates and sends a first request to the processor according to an interface function, where the first request includes a read request and a write request, and the first request carries a keyword or the first request carries the keyword and a first index value corresponding to the keyword; the processor generates a second request according to the first request, wherein the second request comprises one or more of a clustering request, a search request, an address allocation request, an address acquisition request and an access request of a storage device; and the processor uses the second request to operate at least one of the accelerator, the mapping manager and the solid state disk so as to complete the processing of one or more second requests, obtain the processing result of the first request and return the processing result to the host.
The first information processing system according to the second aspect of the present application provides the second information processing system according to the second aspect of the present application, wherein when the first request is a write request and the first request carries the keyword and the first index value, the processor generates the address allocation request and an access request of the storage device; or, when the first request is a write request and the first request carries the keyword, the processor generates the clustering request, the address allocation request and an access request of the storage device.
The first or second information processing system according to the second aspect of the present application provides the third information processing system according to the second aspect of the present application, wherein when the first request is a read request and the first request carries the keyword and the first index value, the processor generates the address acquisition request, the access request of the storage device, and the search request; or, when the first request is a read request and the first request carries the keyword, the processor generates the clustering request, the address obtaining request, the access request of the storage device, and the search request; or, when the first request is a read Index request (Get _ Index) and the first request carries the keyword, the processor only generates the clustering request.
The first or second information processing system according to the second aspect of the present application provides the fourth information processing system according to the second aspect of the present application, wherein if the second request generated by the processor includes the address allocation request and an access request of the storage device, the processor sends the address allocation request to the mapping manager, and obtains the storage address provided by the mapping manager; and the processor sends an access request of the storage device carrying the storage address to the solid state disk to instruct the solid state disk to store the data associated with the keyword according to the storage address.
The information processing system according to the second aspect of the present application, where if the second request generated by the processor includes the clustering request, the address assignment request, and the access request, the processor sends the clustering request to the accelerator and obtains a first index value returned by the accelerator; the processor sends the address allocation request carrying the acquired first index value and/or the keyword to the mapping manager, and acquires a storage address allocated to the keyword by the mapping manager according to the first index value; and the processor sends an access request of the storage device carrying the storage address to the solid state disk to instruct the solid state disk to store the data associated with the keyword according to the storage address.
The sixth information processing system according to the second aspect of the present application is provided as the first or third information processing system according to the second aspect of the present application, wherein if the second request generated by the processor only includes the clustering request, the processor sends the clustering request to the accelerator and obtains a first index value returned by the accelerator; and the processor returns the first index value to the host as a result of processing the first request.
The first or third information processing system according to the second aspect of the present application provides the seventh information processing system according to the second aspect of the present application, wherein if the second request generated by the processor includes the address obtaining request, the access request, and the search request, the processor sends the address obtaining request carrying the obtained first index value to the mapping manager, and obtains at least one storage address of at least one feature value corresponding to the first index value returned by the mapping manager; the processor accesses the solid state disk by using the at least one storage address and acquires the at least one characteristic value returned by the solid state disk; the processor sends the search request indicating the keyword and the at least one feature value to the accelerator, so as to obtain at least one first feature value corresponding to the keyword, which is selected by the accelerator from the at least one feature value, and returns the at least one first feature value to the processor, where the first feature value is a feature value whose distance from the keyword is smaller than a preset threshold value or M feature values whose distances from the keyword are the smallest in the at least one feature value, and M is greater than or equal to 1; the processor returns the at least one first characteristic value to the host as a result of processing the first request.
The eighth information processing system according to the second aspect of the present application is provided according to the first or third information processing system of the second aspect of the present application, wherein if the second request generated by the processor includes the clustering request, the address obtaining request, the access request, and the search request, the processor sends the clustering request to the accelerator and obtains a first index value returned by the accelerator; the processor sends the address acquisition request carrying the acquired first index value to the mapping manager, and acquires the at least one storage address of at least one characteristic value corresponding to the first index value returned by the mapping manager; the processor accesses the solid state disk by using the at least one storage address to acquire the at least one characteristic value returned by the solid state disk; the processor sends the search request indicating the keyword and the at least one feature value to the accelerator, and obtains the at least one first feature value returned by the accelerator, where the first feature value is a feature value whose distance from the keyword is smaller than a preset threshold or the at least one first feature value is M feature values whose distances from the keyword are minimum in the at least one feature value, and M is greater than or equal to 1; the processor returns the at least one first characteristic value to the host as a result of processing the first request.
A ninth information processing system according to the second aspect of the present application is provided in accordance with one of the first to second information and fourth to fifth information processing systems of the second aspect of the present application, wherein the processor indicates completion of processing of the first request to the host in response to receiving completion information for characterizing completion of storage of the keyword from the solid state disk.
According to one of the first to ninth information processing systems of the second aspect of the present application, there is provided the tenth information processing system of the second aspect of the present application, wherein the interface function carries at least one of a type of the request, the keyword, and the first index value.
According to one of the first to tenth information processing systems of the second aspect of the present application, there is provided the eleventh information processing system of the second aspect of the present application, wherein if the second request generated by the processor includes at least two requests among the clustering request, the search request, the address assignment request, the address acquisition request, and the access request, the processor generates the at least two requests at a time in accordance with the first request; or the processor generates the at least two requests according to the first request by dividing the first request into a plurality of times.
An eleventh information processing system according to the second aspect of the present application provides the twelfth information processing system according to the second aspect of the present application, wherein if the second request generated by the processor includes the clustering request, the address assignment request, and the access request, the processor generates the clustering request according to the first request and sends the clustering request to the accelerator; in response to the clustering request, the accelerator determines a first index value corresponding to the keyword and returns the first index value to the processor; after receiving the first index value, the processor generates the address allocation request according to the first request and the first index value, and sends the address allocation request to the mapping manager; responding to the address allocation request, allocating a storage address for the keyword by the mapping manager according to the first index value, and returning the storage address to the processing, wherein the storage address is used for accessing the solid state disk; after receiving the storage address, the processor generates an access request of the solid state disk according to the first request and the storage address, and sends the access request of the solid state disk to the solid state disk; responding to the access request of the solid state disk, and storing the keywords by the solid state disk according to the storage address.
According to a third aspect of the present application, there is provided a first host according to the third aspect of the present application, including an application program module, an API interface module, a management module, an acceleration driver module, a mapping driver module, and a hard disk driver module, where the application program module calls an API provided by the API interface module; the management module sends a driving request to at least one of the acceleration driving module, the mapping driving module and the hard disk driving module according to the called API; if the driving request is received, the acceleration driving module operates an accelerator coupled with the acceleration driving module, and returns a first operation result to the management module; if the driving request is received, the mapping driving module operates a mapping manager coupled with the mapping driving module, and returns a second operation result to the management module; if the drive request is received, the hard disk drive module operates the solid state disk coupled with the hard disk drive module, and returns a third operation result to the management module; and after receiving the first operation result, the second operation result or the third operation result, the management module returns information for indicating that the called API is called to be completed to the application program module through the API interface module.
According to the first host of the third aspect of the present application, there is provided the second host of the third aspect of the present application, wherein the first operation result indicates a first index value corresponding to a key, the key being a key provided by the application program calling API; the second operation result indicates one or more storage addresses for accessing the solid state disk; the third operation result indicates that the solid state disk processes to complete the write operation on the one or more addresses, or a first feature value read from the solid state disk, where the first feature value is a feature value whose distance from the keyword is smaller than a preset threshold or the at least one first feature value is M feature values whose distances from the keyword are minimum in the at least one feature value, and M is greater than or equal to 1.
According to a fourth aspect of the present application, there is provided a first information processing method according to the fourth aspect of the present application, comprising: in response to receiving a first request indicating a Key (Key), generating one or more second requests, the second requests including one or more of a clustering request, a search request, an address assignment request, an address acquisition request, and an access request of a storage device; and generating a processing result of the first request according to the processing results of the one or more second requests.
According to the first information processing method of the fourth aspect of the present application, there is provided the second information processing method of the fourth aspect of the present application, further comprising: the clustering request indicates to obtain an index value closest to the keyword from X index values, wherein X is a positive integer; the search request indicates to obtain M characteristic values which are closest to the keyword in N characteristic values, wherein M and N are positive integers; the address allocation request indicates to acquire one or more storage addresses corresponding to the keywords and/or the first index values; the address acquisition request indicates to acquire one or more storage addresses, corresponding to the keyword, storing the X index values; the access request of the storage device instructs the storage device to access the one or more storage addresses.
According to the first or second information processing method of the fourth aspect of the present application, there is provided the third information processing method according to the fourth aspect of the present application, further comprising: if the first request is a request which is generated by calling a get (key) API and indicates a keyword, generating a clustering request according to the keyword, generating an address acquisition request by using an index value returned by the clustering request, generating an access request of the storage device according to a storage address returned by the address acquisition request, generating a search request according to X index values returned by the access request of the storage device, and taking M characteristic values returned by the search request as a processing result of the first request.
According to one of the first to third information processing methods of the fourth aspect of the present application, there is provided the fourth information processing method according to the fourth aspect of the present application, further comprising: if the first request is a request which is generated by calling a Get _ With _ Index (Key, Index) API and indicates a keyword and an Index value, generating an address acquisition request according to the Index value, generating an access request of the storage device according to a storage address returned by the address acquisition request, generating a search request according to X Index values returned by the access request of the storage device, and taking M characteristic values returned by the search request as a processing result of the first request.
According to one of the first to fourth information processing methods of the fourth aspect of the present application, there is provided the fifth information processing method according to the fourth aspect of the present application, further comprising: if the first request is a request which is generated by calling a Get _ index (key) API and indicates a keyword, generating a clustering request according to the keyword, and using an index value returned by the clustering request as a processing result of the first request.
According to one of the first to fifth information processing methods of the fourth aspect of the present application, there is provided the sixth information processing method according to the fourth aspect of the present application, further comprising: if the first request is a request which is generated by calling a put (key) API and indicates a keyword, generating a clustering request according to the keyword, generating an address allocation request by using an index value returned by the clustering request and the keyword, and generating an access request of the storage device according to a storage address returned by the address allocation request so as to write data corresponding to the keyword into the storage device.
According to one of the first to fifth information processing methods of the fourth aspect of the present application, there is provided the seventh information processing method according to the fourth aspect of the present application, further comprising: if the first request is a request which is generated by calling a Put _ With _ Index (Key, Index) API and indicates a keyword and an Index value, generating an address allocation request according to the Index value and the keyword, and generating an access request of the storage device according to a storage address returned by the address allocation request so as to write data corresponding to the keyword into the storage device.
According to a fifth aspect of the present application, there is provided a first information processing system according to the fifth aspect of the present application, comprising: a first module for generating one or more second requests in response to receiving a first request indicating a keyword (Key), the second requests including one or more of a clustering request, a search request, an address assignment request, an address acquisition request, and an access request of a storage device; and the second module is used for generating the processing result of the first request according to the processing result of one or more second requests.
According to a sixth aspect of the present application, there is provided a first information processing system according to the sixth aspect of the present application, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program: in response to receiving a first request indicating a Key (Key), generating one or more second requests, the second requests including one or more of a clustering request, a search request, an address assignment request, an address acquisition request, and an access request of a storage device; and generating a processing result of the first request according to the processing results of the one or more second requests.
According to a seventh aspect of the present application, there is provided a first information processing system according to the seventh aspect of the present application, comprising a host, an accelerator, a mapping manager, and a storage device, wherein the accelerator, the mapping manager, and the storage device are each coupled to the host; the host generates a second request according to the first request, wherein the second request comprises one or more of a clustering request, a search request, an address allocation request, an address acquisition request and an access request of the storage device; and the host uses the second request to operate at least one of the accelerator, the mapping manager and the solid state disk so as to complete the processing of one or more second requests and obtain the processing result of the first request.
The first information processing system according to the seventh aspect of the present application provides the second information processing system according to the seventh aspect of the present application, wherein the application of the host generates the first request; the management program of the host generates a second request according to the first request; the first request comprises a read request or a write request, and the first request carries a keyword or the keyword and an index value corresponding to the keyword.
According to the first or second information processing system of the seventh aspect of the present application, there is provided a third information processing system of the seventh aspect of the present application, wherein if the first request is a request which is generated by calling a Put _ With _ Index (Key, Index) API and carries a keyword and a first Index value, the host generates the address allocation request and an access request of the storage device; or, if the first request is a write request carrying a key generated by calling a put (key) API, the host generates the clustering request, the address allocation request, and an access request of the storage device.
According to one of the first to third information processing systems of the seventh aspect of the present application, there is provided a fourth information processing system of the seventh aspect of the present application, wherein if the first request is a read request that carries a keyword and an Index value and is generated by calling a Get _ With _ Index (Key, Index) API, the host generates the address acquisition request, the access request of the storage device, and the search request; or, if the first request is a read request which is generated by calling a get (key) API and carries a keyword, the host generates the clustering request, the address obtaining request, the access request of the storage device, and the search request; or, the first request is a read request generated by calling a Get _ Index () API, and the processor generates only the clustering request.
According to one of the first to third information processing systems of the seventh aspect of the present application, there is provided a fifth information processing system of the seventh aspect of the present application, wherein if the second request generated by the host includes the address allocation request and an access request of the storage device, the host sends the address allocation request to the mapping manager, and acquires the storage address provided by the mapping manager; and the host sends an access request of the storage device carrying the storage address to the solid state disk to indicate the solid state disk to store the data associated with the keyword according to the storage address.
According to one of the first to third information processing systems of the seventh aspect of the present application, there is provided the sixth information processing system of the seventh aspect of the present application, wherein if the second request generated by the host includes the clustering request, the address assignment request, and the access request, the host sends the clustering request to the accelerator and obtains a first index value returned by the accelerator; the host sends the address allocation request carrying the acquired first index value and/or the keyword to the mapping manager, and acquires a storage address allocated to the keyword by the mapping manager according to the first index value; and the host sends an access request of the storage device carrying the storage address to the solid state disk to indicate the solid state disk to store the data associated with the keyword according to the storage address.
According to the first or fourth information processing system of the seventh aspect of the present application, there is provided the seventh information processing system of the seventh aspect of the present application, wherein if the second request generated by the host only includes the clustering request, the host sends the clustering request to the accelerator, and obtains a first index value returned by the accelerator; and the host uses the first index value as a processing result of the first request.
According to the first or fourth information processing system of the seventh aspect of the present application, there is provided an eighth information processing system of the seventh aspect of the present application, wherein if the second request generated by the host includes the address obtaining request, the access request, and the search request, the host sends the address obtaining request carrying the obtained first index value to the mapping manager, and obtains at least one storage address of at least one feature value corresponding to the first index value returned by the mapping manager; the host accesses the solid state disk by using the at least one storage address and acquires the at least one characteristic value returned by the solid state disk; the host sends the search request indicating the keyword and the at least one feature value to the accelerator, so as to obtain at least one first feature value corresponding to the keyword, which is selected by the accelerator from the at least one feature value, and returns the at least one first feature value to the processor, wherein the first feature value is a feature value with a distance from the keyword smaller than a preset threshold value or is M feature values with the smallest distance from the keyword in the at least one feature value, and M is greater than or equal to 1; the host uses the at least one first characteristic value as a result of processing the first request.
According to the first or fourth information processing system of the seventh aspect of the present application, there is provided a ninth information processing system of the seventh aspect of the present application, wherein if the second request generated by the host includes the clustering request, the address obtaining request, the access request, and the search request, the host sends the clustering request to the accelerator and obtains a first index value returned by the accelerator; the host sends the address acquisition request carrying the acquired first index value to the mapping manager, and acquires the at least one storage address of at least one characteristic value corresponding to the first index value returned by the mapping manager; the host accesses the solid state disk by using the at least one storage address to acquire the at least one characteristic value returned by the solid state disk; the host sends the search request indicating the keyword and the at least one feature value to the accelerator, and obtains the at least one first feature value returned by the accelerator, where the first feature value is a feature value whose distance from the keyword is smaller than a preset threshold or the at least one first feature value is M feature values whose distances from the keyword are minimum in the at least one feature value, and M is greater than or equal to 1; the host uses the at least one first characteristic value as a result of processing the first request.
According to one of the first to third and fifth to sixth information processing systems of the seventh aspect of the present application, there is provided the tenth information processing system of the seventh aspect of the present application, wherein the host recognizes that the processing of the first request is completed, in response to receiving completion information for characterizing that the storage of the keyword is completed from the solid state disk.
According to one of the first to tenth information processing systems of the seventh aspect of the present application, there is provided the eleventh information processing system of the seventh aspect of the present application, wherein the interface function carries at least one of a type of the request, the keyword, and the first index value.
According to one of the first to eleventh information processing systems of the seventh aspect of the present application, there is provided the twelfth information processing system of the seventh aspect of the present application, wherein if the second request generated by the host includes at least two requests among the clustering request, the search request, the address assignment request, the address acquisition request, and the access request, the host generates the at least two requests at a time according to the first request; or the host generates the at least two requests according to the first request by dividing the first request into a plurality of times.
According to a twelfth information processing system of the seventh aspect of the present application, there is provided the thirteenth information processing system of the seventh aspect of the present application, wherein if the second request generated by the host includes the clustering request, the address assignment request, and the access request, the host generates the clustering request according to the first request, and sends the clustering request to the accelerator; responding to the clustering request, the accelerator determines a first index value corresponding to the keyword and returns the first index value to the host; after receiving the first index value, the host generates the address allocation request according to the first request and the first index value, and sends the address allocation request to the mapping manager; responding to the address allocation request, allocating a storage address for the keyword by the mapping manager according to the first index value, and returning the storage address to the host, wherein the storage address is used for accessing the solid state disk; after receiving the storage address, the host generates an access request of the solid state disk according to the first request and the storage address, and sends the access request of the solid state disk to the solid state disk; responding to the access request of the solid state disk, and storing the keywords by the solid state disk according to the storage address.
According to an eighth aspect of the present application, there is provided a first information processing method according to the eighth aspect of the present application, comprising: generating a second request according to the first request, wherein the second request comprises one or more of a clustering request, a search request, an address allocation request, an address acquisition request and an access request of the storage device; and operating at least one of the accelerator, the mapping manager and the solid state disk by using the second request to complete the processing of one or more second requests so as to obtain the processing result of the first request.
According to a ninth aspect of the present application, there is provided a first information processing system according to the ninth aspect of the present application, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program: generating a second request according to the first request, wherein the second request comprises one or more of a clustering request, a search request, an address allocation request, an address acquisition request and an access request of the storage device; and operating at least one of the accelerator, the mapping manager and the solid state disk by using the second request to complete the processing of one or more second requests so as to obtain the processing result of the first request.
The management and the drive of the accelerator, the mapping manager and the solid state disk are implemented by the processor, so that the interaction between the host and the processor is reduced, and the processing efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1A is a block diagram of a prior art memory device;
FIG. 1B is a schematic diagram of a prior art FTL table;
fig. 2 is a schematic structural diagram of an accelerator for cluster computation according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a distance calculation module according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a result processing module merging calculation results provided in the embodiment of the present application;
fig. 5A is a schematic structural diagram of an information processing system according to an embodiment of the present application;
FIGS. 5B-5F are schematic flow diagrams of information handling system processes provided by embodiments of the present application;
FIG. 6 illustrates a host 600 provided in accordance with yet another embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As shown in fig. 2, an embodiment of the present application provides an accelerator 200 for cluster computation. The accelerator 200 is a hardware accelerator that provides functions such as clustering and searching. In the clustering function, an accelerator acquires Key values (keys) and a specified number of Index values (indexes), and gives an Index value closest to the acquired Key values. The index value represents a plurality of categories (for example, X categories, X being a positive integer) already existing, and the index value output by the accelerator, which is closest to the acquired key value, is the index value having the smallest distance among X distances obtained by comparing the key value with each of the X index values.
In the search function, the accelerator acquires a key value and a specified number (for example, N being a positive integer) of feature values (Vector), and gives M (M being a positive integer, M < N) feature values closest to the acquired key value. The N feature values belong to the same category and are associated with the same index value. Optionally, the N feature values are all feature values of the specified category, and the accelerator finds M feature values closest to the obtained key value from all feature values of the specified category. Similar to in the clustering function, the accelerator obtains the proximity of the key value to each feature value by calculating the distance.
The accelerator 200 includes an arbitration module 201, a data handling module 202, and a distance calculation module 203.
An external unit (e.g., a host, a processor running software or firmware, a user accessing the accelerator over a network, etc.) using the accelerator 200 sends an acceleration request to the accelerator 200 and obtains a result of processing the acceleration request provided by the accelerator 200. The arbitration module 201 of the accelerator 200 sends a data transfer request to the data transfer module 202 according to the received acceleration request. As an example, an acceleration request indicating a clustering function carries a key. In the clustering function, X index values also need to be obtained. The arbitration module 201, e.g., an accelerator, is aware of the storage locations of the X index values. Optionally, the external unit configures the accelerator 200 to indicate the X index values or their storage locations to the accelerator 200. Still optionally, the external unit further indicates that the storage location and/or the number of index values of the accelerator 200 are updated, so that the arbitration module 201 gets to obtain more or all of the updated index values.
As yet another example, an acceleration request indicating a search function carries a key value and an index value corresponding to the key value. In the search function, N feature values corresponding to a key value and an index value corresponding to the key value are required in addition to the key value and the index value. For example, the arbitration module 201 is aware of the storage locations of the N feature values. As another example, the external unit configures the accelerator 200 to indicate N feature values or their storage locations to the accelerator 200. As another example, the external unit may also indicate to the accelerator 200 that the storage location and/or number of N feature values are updated, such that the arbitration module 201 obtains the updated plurality or all of the feature values.
According to an embodiment, when the first request carries the key value, the data moving module 202 moves the key value and X index values from a double data rate synchronous dynamic random access memory (ddr sdram) to the distance calculating module 203 according to the second request, where X is an integer greater than or equal to 2; the distance calculation module 203 performs distance calculation on the key value according to the X index values to obtain a first index value, where the first index value is an index value having a minimum distance from the key value among the X index values.
It should be noted that, in the embodiment of the present application, the first request is sent by the host, and the first request is a read request or a write request. The arbitration module 201 receives a new request, such as a cluster request or a search request, which is generated by the firmware after receiving the first request, instead of the first request sent by the host. The write request is, for example, an access generated to the accelerator by calling a Put (Key) or Put _ With _ Index API (Application programming Interface) function. Read requests include, for example, accesses to the accelerator that call the Get (Key), getindex (Key), and Get _ With _ Index APIs. Wherein, the parameter Key of the API function is a Key value, and the parameter Index represents an Index value.
For example, when the first request carries a Key value and a first Index value, the first request is an access request to the accelerator, which is generated through Put _ With _ Index (Key _ Index) or Get _ With _ Iindex (Key _ Index); when the first request carries a key value and does not carry a first index value, the first request is an access request to the accelerator, which is generated through put (key), get (key) or getindex (key). The first request is issued by the host and reaches the accelerator 200 at the hardware layer. By way of example, the accelerator 200 receives a first request, such as a cluster request or a search request, that is processed by the firmware layer, e.g., the accelerator 200 receives a request that does not carry a key value and/or a first index value, but that indicates an address of the key value and/or the first index value in a memory (e.g., a memory external to the accelerator 200 and coupled to the accelerator 200).
In the embodiment of the present application, the arbitration module 201 in the accelerator 200 is connected to the firmware and receives access from the firmware. Optionally, arbitration module 201 provides multiple channels to the firmware layer, each channel logically processing acceleration requests independently and/or in parallel. Thus, from the perspective of the firmware, the accelerator 200 provides a plurality of logical accelerators (one accelerator per channel), with the firmware indicating the channel used to accelerate requests provided by the accelerator 200.
The arbitration module 201 can receive the request sent by the firmware layer to the channel in the following ways, including:
in the mode 1, the arbitration module 201 receives requests from each channel in a round-robin arbitration mode;
in the mode 2, the arbitration module 201 receives requests from each channel in an arbitration mode of weighted round-robin scheduling;
in mode 3, the arbitration module 201 receives requests from each channel in an arbitration manner according to a preset sequence.
The arbitration method of the arbitration module 201 is explained below by taking an example of a plurality of channels being 4 channels, wherein the 4 channels are the channel No. 1, the channel No. 2, the channel No. 3, and the channel No. 4.
For the mode 1, the round-robin scheduling is implemented by using a RoundRobin algorithm, and the arbitration module 201 starts to receive the request from the channel No. 1, processes the request if the request is received from the channel No. 1 this time, and receives the request from the channel No. 2 next time, receives the request from the channel No. 3 for the 3 rd time, receives the request from the channel No. 4 for the 4 th time, and returns to the channel No. 1 for the 5 th time.
As for the mode 2, the weighted round robin scheduling is implemented by using a WeightedRoundRobin algorithm, at this time, the arbitration module 201 polls 4 channels, calculates the current weight of each channel in the 4 channels and the total weight of the 4 channels, and then selects the channel with the largest current weight among the 4 channels to receive the request, where for example, the current weights of the 4 channels are: the channel weight number 1 is 1, the channel weight number 2 is 3, the channel weight number 3 is 2 and the channel weight number 4 is 4, the channel number 4 of the weight number 4 is selected, after the channel number 4 is selected, the weight number change of the 4 channels is: the channel weight number 1 is 1, the channel weight number 2 is 3, the channel weight number 3 is 2, and the channel weight number 4 is-6, and the weight change of the first 4 channels of the next polling is: the channel number 1 has a weight of 2, the channel number 2 has a weight of 6, the channel number 3 has a weight of 4, and the channel number 4 has a weight of-2.
With respect to mode 3, the arbitration module 201 receives the requests in a predetermined order, for example, the predetermined order is lane 3 → lane 1 → lane 4 → lane 2, the arbitration module 201 accesses lane 3 first, and if there is no request waiting to be processed in lane 3, the arbitration module 201 continues to access lane 1. It should be noted that the arbitration module 201 starts accessing from channel number 3 each time.
After receiving the cluster request or the search request from the plurality of channels, the arbitration module 201 sends a data transfer request to the data transfer module 202. The data transfer request instructs the data transfer module 202 to transfer data from the external memory to the distance calculation module 203.
Since the first request is Put (Key), Put _ With _ Index (Key, Index), getindex (Key), Get _ With _ Iindex (Key), or Get _ With _ Iindex (Key _ Index), the arbitration module 201 receives a search request or a clustering request according to the difference of the first request. So that the second request issued by the arbitration module 201 according to the search request or the cluster request also includes a plurality of cases, i.e., the data transported is different according to the different indications of the first request.
TABLE 1
Figure BDA0002203846910000121
As shown in table 1, when the first request is put (key) or get (key), the data-handling module 202 needs to handle the key value and X index values from the DDR, and when the first request is get (key) or get withindex, the data-handling module 202 needs to handle the key value and a plurality of feature values from the DDR. It should be noted that, when the first request is get (key), the arbitration module 201 receives the clustering request and the search request twice, instead of receiving two requests at a time. It will be appreciated that the X index values are not indicated in the request, nor are their storage locations indicated. But the arbitration module or data movement module knows the X index values or their storage locations.
The key value key, the Index value (Index), and the feature value (Vector) are explained here. Information handling systems process a variety of data. Such as videos, pictures, text files, etc., are unstructured data. Unstructured is not convenient to retrieve and use. Unstructured data is structured for ease of retrieval or use. For example, the original data is tagged to indicate structured information such as the source, format, storage location, required access rights, etc. of the original data. As yet another example, the content of the raw data is analyzed, e.g., extracting, thumbnails, summaries from videos or pictures, extracting objects such as figures, cars, etc., extracting features such as facial features, height, gender, age, etc., of people, extracting features such as car brands, license plates, etc. The labels, features, etc. associated with the raw data are all structured information generated by the structuring process. Structured information is easily retrieved through prior art database or search engine techniques, or other search techniques that may exist or may come in the future.
Taking one or more labels and/or characteristics in the structured information as keys, taking data associated with the structured information and/or the structured information as values, and recording, accessing and retrieving unstructured data through the KV memory device. Chinese patent application No. 2018102074169 entitled KV stored key and value generation method and apparatus provides a method for generating keys using machine learning components.
Processing data, such as a picture of a human face, using a machine learning component will result in feature values (also referred to as vectors) associated with a particular picture of a human face. Feature values are generated by the machine learning component for each of the plurality of pictures. The characteristic value has a smaller size than the data, for example, 64 bytes to 4 KB.
In the KV storage system, an input key value (key) is used to retrieve data (value). In order to facilitate retrieval and data acquisition, in the embodiment of the present application, the key value, the index value, and the feature value are indexes for data, that is, the key value, the index value, and the feature value may be called as feature values, where a difference is that the index value is configured in advance, the key value is input to the host by a user and carried in the first request, and the feature value is calculated according to the data. Specifically, the form of the characteristic value differs depending on the kind of data to be stored in the information processing system. For example, the data to be stored is image data, and the feature value is a vector (vector) obtained by processing an image. The manner in which the image data is processed to obtain the feature values can be referred to in the prior art, and is not limited herein.
Clustering is performed on a plurality of eigenvalues, so that a plurality of groups of eigenvalues are obtained. Each group includes 1 or more feature values. Each group of feature values is represented by an index value. For example, a center value (i.e., an average value) obtained by averaging a plurality of feature values belonging to the same group is used as an index value representing the group. Conversely, a feature value whose distance from the index value is smaller than a preset threshold value may be regarded as a feature value belonging to the group represented by the index value.
The distance between feature values is the distance between vectors represented by the feature values, and the distance includes a euclidean distance or a cosine distance. In one example, the index value is generated based on a plurality of feature values; in yet another example, multiple index values may be preconfigured when there is no data at all in an information handling system using accelerator 200.
The second request carries the following information:
when the acceleration request is a clustering request, the second request carries a key value and a first physical address of X index values, or the key value and a pointer for pointing to the first physical address; when the acceleration request is a search request, the second request carries a key value and a second physical address of the N characteristic values, the key value and the N index values, or the key value and a pointer for pointing to the second physical address;
where the acceleration request is a cluster request or a search request, the second request also carries a physical address for storing the results of the accelerator output.
It should be noted that the key value referred to in the embodiments of the present application may be a set of values, and does not represent only one value. The first physical address and the second physical address are both addresses in the external memory, so that the data-moving module 202 can move data in the external memory to the distance-calculating module 203.
The data received by the distance calculation module 203 is different, and the distance calculation module 203 implements two functions, including a clustering function and a search function.
Specifically, when the first request carries the key value and the first index value, the data carrying module 202 carries the key value and the N feature values to the distance calculating module 203, and the distance calculating module 203 implements the search function. At this time, the distance calculation module 203 performs distance calculation on the key value according to the N feature values to obtain M feature values with the minimum distance from the key value from the N feature values, where N and M are integers greater than or equal to 1, and M is less than or equal to N. It should be noted that the N feature values belong to all the feature values in the category corresponding to the first index value.
Optionally, in order to obtain the minimum M first feature values, the distance calculation module 203 calculates a distance between each feature value of the N feature values and the key value to obtain N first distances; the distance calculation module sorts the N first distances from small to large according to values to obtain a sorted distance queue; the distance calculation module selects the first M first characteristic values from the sequencing result.
Taking N equal to 1000 as an example, the distance calculation module 203 performs 1000 times of distance calculation on the key values to obtain 1000 distances, then sorts the 1000 distances from small to large to obtain a distance queue with a length of 1000, then selects M distances with the minimum value from the distance queue, and obtains M first feature values corresponding to the M distances.
In an optional embodiment, in order to reduce the cache resources required by the accelerator 200, when obtaining M first feature values, the length of the distance queue is set to M in advance, that is, the distance queue can only cache M first distances, and when calculating the first distances, the distance calculation module 203 inserts the first distances into the distance queue with the queue length of M in an insertion and sorting manner after calculating one first distance each time, so that M first distances with the smallest values can be directly obtained after all N distances are calculated. In the embodiment of the application, the insertion sorting is to determine the position of the first distance in the distance queue by comparing the first distance with the first distance in the distance queue after calculating one first distance each time. For example, the currently calculated first distance is 0.8, the value of the first distance of the 3 rd bit arranged in the distance queue is 0.5, and the value of the first distance of the 4 th bit arranged in the distance queue is 0.82, so that it is determined that the position of the currently calculated first distance in the distance queue is the 4 th bit, and the position of the first distance of the 0.82 in the distance queue needs to be extended to the 5 th bit. For another example, if the currently calculated first distance is 10 and the value of the first distance arranged at the mth bit of the distance queue is 8.7, the currently calculated first distance is too large to be placed in the distance queue.
In the embodiment of the present application, M is less than or equal to 256, and in an optional embodiment, a value of M may be adjusted according to a requirement, for example, 1024, 50, 128, and the like. Optionally, when the N first distances are sorted in the interleaving sorting manner, the queue length of the distance queue always maintains 256 entries. It should be noted that since N may be less than 256 or 256 first distances have not been calculated, there are empty entries in the distance queue at this time.
When the first request carries the key value but does not carry the first index value, the data carrying module 202 carries the key value and X index values to the distance calculating module 203, and the distance calculating module 203 implements the clustering function, wherein the acceleration request received by the arbitration module 201 at this time is a clustering request, so that the data carrying module 202 obtains X index values from the DDR, and the X index values are pre-stored in the external memory. At this time, the distance calculation module 203 performs distance calculation on the key value according to the X index values to obtain a first index value, where the first index value is an index value with the smallest distance from the key value among the X index values, and X is an integer greater than or equal to 2.
Optionally, when the distance calculation module 203 performs distance calculation on the key value, the distance calculation module 203 calculates a distance between each index value of the X index values and the key value to obtain X second distances;
the distance calculation module 203 compares the magnitudes of any two of the X second distances, and determines the index value of the smallest second distance among the X index values as the first index value.
Specifically, the distance calculation module 203 calculates all the X second distances, and then compares any two of the X second distances to obtain the first index value. The distance calculation module 203 also compares the second distances while calculating the second distances, for example, after calculating 2 second distances, first compares the 2 second distances, retains the smaller second distance, then calculates the 3 rd second distance, compares the second distance retained before with the second distance retained before, and so on, retains the second distance with a smaller value after each comparison until all the X second distances are calculated and compared, and finally retains the index value corresponding to the second distance as the first index value.
It should be noted that, in the embodiment of the present application, the distance calculation is implemented by using an euclidean distance calculation or a cosine distance calculation, where the euclidean distance calculation is implemented by using a formula (1), and the cosine included angle value calculation is implemented by using a formula (2), as follows:
Figure BDA0002203846910000141
Figure BDA0002203846910000142
wherein d is Euclidean distance, cos θ is cosine distance, vector { x1, x2, …, xn } is key value, vector { y1, y2, …, yn } is index value or characteristic value. It should be noted that the vector in the formula is an n-dimensional (n is an integer equal to or greater than 1) vector, and the key value, the index value, or the feature value in this application takes the form of, but not limited to, an n-dimensional vector, such as a number, a character, or the like.
The first distance and the second distance in this embodiment are both calculated by using the above formula (1) or formula (2), and may be selected according to requirements, or calculated by using a method in the prior art, which is not limited herein.
Optionally, the accelerator 200 further includes a normalization module 204, and before the distance calculation module 203 receives the key values and the X index values, the normalization module 204 performs normalization operation on the key values and the X index values, and sends the normalized key values and the X index values to the distance calculation module 203; alternatively, the first and second electrodes may be,
before the distance calculation module 203 receives the key values and the N feature values, the normalization module 204 performs normalization operation on the key values and the N feature values, and sends the normalized key values and the N feature values to the distance calculation module 203.
The normalization module 204 adopts the following formula (3) when performing the normalization operation:
Figure BDA0002203846910000143
where A represents the input vector and Len (A) is the length of vector A (i.e., the modulus of vector A).
It should be noted that the normalization module 204 performs normalization operations on the key values and the X index values or the key values and the N feature values, and is to normalize the key values, and normalize each index value of the X index values or each feature value of the N feature values, respectively. When the distance calculation adopts the cosine clip angle value, the calculation of the cosine clip angle value can be simplified by carrying out normalization in advance, so that the requirement on the calculation capacity can be reduced. This is due to the denominator part of the angle calculation, such as equation (2)
Figure BDA0002203846910000144
In fact, the product of the lengths of the two vectors, and after normalization, the length of the vector becomes 1, so that equation (2) is reduced to equation (4), as follows:
cos θ=x1×y1+x2×y2+…+xn×yn(4)
wherein cos θ is a cosine value of the included angle θ, the vector { x1, x2, …, xn } is a normalized key value, and the vector { y1, y2, …, yn } is a normalized index value or a feature value.
In one example, the distance calculation module 203 performs cosine distance calculation, so that before the distance calculation module 203 receives the key value and the X index values, the normalization module 204 performs normalization operation on the key value and the X index values, and sends the normalized key value and the normalized X index values to the distance calculation module 203.
In one example, the distance calculation module 203 performs euclidean distance calculation, so that the key value and the X index values are sent to the distance calculation module 203 without performing normalization calculation.
It will be appreciated that the distance calculation, together with the optional normalization calculation, creates a number of computational tasks. For the clustering function, to perform distance calculation between a key value and each of X index values, a sorting operation needs to be performed to obtain an index value closest to the key value. For the search function, a distance calculation is performed between the key value and each of the N eigenvalues, and a sorting operation is performed to obtain M eigenvalues. These calculation tasks, if calculated by the processor by executing the program, will impose a huge calculation burden on the processor and increase the processing delay for processing the read/write request. In the embodiment of the present application, by providing the accelerator 200, the calculation operations for the clustering function and the search function are assumed, the burden on the processor is reduced, and the calculation is completed by hardware rather than by executing a program, the processing delay is reduced, and the data throughput is increased. Optionally, when the arbitration module 201 obtains a request from one of the plurality of channels, the arbitration module 201 obtains a plurality of first messages corresponding to the first request at a time, and the arbitration module 201 instructs the plurality of computing units in the distance computing module 203 to process the plurality of first messages.
Referring to fig. 3, a possible structure of the distance calculation module 203 is shown in fig. 3, and the distance calculation module 203 includes a plurality of calculation units (PE) 2031, each of which performs calculation of the above formula (1), formula (2), or formula (4).
It should be noted that the first request in this embodiment only indicates the request sent by the host, and does not limit the number of the first requests, that is, the host may send multiple requests, and the firmware processes the request sent by the host and sends the processed multiple requests to the arbitration module 201 through multiple channels, so that the arbitration module 201 may receive multiple requests in each of the multiple channels, and therefore, the arbitration module 201 may obtain multiple first messages in one channel, where each first message corresponds to one first request. Since there are multiple first messages, the arbitration module 201 assigns a computing unit to each first message, so that the distance computation module 203 can process multiple first messages simultaneously, i.e., multiple computing units process multiple first messages in parallel, where each first message corresponds to a set of key values.
Specifically, since the first request sent by the host carries key values, the key values carried by the first request may be one or more, that is, when the arbitration module 201 receives an acceleration request, the acceleration request includes a value of one key value or a group of key values in the key values. Further, the plurality of computing units process the plurality of first messages in parallel, such that the key values processed by the plurality of computing units include the following:
case 1, each computing unit processes the value of one key;
case 2, each computing unit processes the values of a set of key values;
in case 3, each of a part of the plurality of calculation units processes the numerical value of one key value, and each of another part of the plurality of calculation units processes the numerical value of one set of key values.
For the case 1, for example, the arbitration module 201 obtains 7 first messages from one channel, and each first message only carries a value of one key value, so that the distance calculation module 203 calculates one value carried in the 7 first messages by using 7 calculation units, respectively.
As an example, for case 2, the arbitration module 201 obtains 5 first messages from one channel, and each first message carries a set of values of key values, so that the distance calculation module 203 calculates the set of values carried in the 5 first messages respectively by using 5 calculation units.
As another example, for case 3, the arbitration module 201 obtains 8 first messages from one channel, where 3 first messages all carry a value of one key value, and 5 first messages all carry values of a group of key values, so that the distance calculation module 203 calculates one value carried in 3 first messages respectively by using 3 calculation units, and calculates a group of values carried in 5 first messages respectively by using 5 calculation units.
Optionally, the distance calculating module 203 includes Y calculating units, the key value includes Z sets of numerical values, Y and Z are both integers greater than or equal to 1, and Z is less than or equal to Y.
The distance calculation module 203 performs distance calculation on the Z-set of values according to the X index values by using the Y calculation units, and obtains an index value corresponding to each value in the Z-set of values.
Specifically, since the distance calculation module 203 includes Y calculation units, each calculation unit processes a set of key values, and the distance calculation module 203 processes Y sets of key values at most simultaneously, the arbitration module 201 obtains Y first messages at most once, for example, Y equals 8, 12 first messages wait in channel No. 1, the arbitration module 201 only obtains 8 first messages from channel No. 1, and the remaining 4 messages need to wait for the arbitration module 201 to process subsequently.
Optionally, the arbitration module 201 assigns the obtained plurality of acceleration messages to a plurality of computing units. For example, one of the computing units is assigned to process one of the expedited messages. By way of example, the computing units each have a computing unit identification, which is used to uniquely determine the computing unit. Referring to FIG. 3, each computing unit has a computing unit identification (2031-1, 2031-2 … … 2031-Y), respectively. The arbitration module 210 assigns a compute unit identification to each acquired acceleration message to instruct the compute unit having the compute unit identification to process the acceleration message.
Optionally, the arbitration module 201 allocates a calculation unit identifier to each of the plurality of first messages, where the calculation unit identifier indicates a corresponding calculation unit in the distance calculation module 203 to process the first message carrying the calculation unit identifier, and the first calculation unit identifier corresponds to the first calculation unit.
Optionally, with continued reference to fig. 3, the distance calculation module 203 further includes Y management units 2032, each of which is coupled to one of the Y calculation units. Each management unit has a computing unit identification corresponding to the computing unit coupled thereto.
The Y management units are all coupled to the data mover module 202. The acceleration messages provided by the data mover module to the compute units with the compute unit identifications appended are each provided simultaneously for all Y managed units. The management unit responds to the received acceleration message, identifies whether the calculation unit identification carried in the acceleration message is consistent with the calculation unit identification owned by the management unit, and only acquires the acceleration message consistent with the calculation unit identification owned by the management unit and provides the acceleration message to the calculation unit corresponding to the management unit.
The management unit also monitors the working state of the computing unit corresponding to the management unit to know whether the computing unit is idle or not and can process the acceleration message. The management unit only retrieves and forwards acceleration messages for the computing units that can process the acceleration messages.
Optionally, the arbitration unit 201 provides Z acceleration messages to Y management units simultaneously. Each of the Z acceleration messages is appended with a computing unit identification. Each of the Y management units identifies an acceleration message that matches its own computing unit identification from the Z acceleration messages, and provides the acquired acceleration message to its corresponding computing unit.
Still alternatively, the arbitration unit 201 provides the Y management units with Z acceleration messages obtained from the same channel (denoted as channel C). Each of the Z acceleration messages is appended with a computing unit identification and the same channel number. Each of the Y management units identifies an acceleration message that matches its own computing unit identification from the Z acceleration messages, and provides the acquired acceleration message to its corresponding computing unit. Further optionally, only subsequent acceleration messages from channel C are provided to Y management units before the Z acceleration messages are processed, and no further acceleration messages from other channels are processed.
When the arbitration module 201 sends the second request, a calculation unit identifier is assigned to the Z group of values, each management unit obtains a group of values from the Z group of values, where the calculation unit identifier is the same as the self-identification, and each management unit sends the obtained group of values to a calculation unit coupled to the management unit.
Specifically, the arbitration module 201 allocates a computing unit identifier to each first message to ensure that data corresponding to the first message enters a designated computing unit for processing, so as to ensure that the sequence of the data is not disordered when the distance computing module 203 processes a plurality of first messages. For example, there are 3 first messages in total, the numbers are No. 1, No. 2, and No. 3, there are 6 computing units in the distance computing module 203, and the numbers are No. 1, No. 2, … …, and No. 6, and the arbitration module 201 allocates No. 1 computing unit identifier to the first message No. 1, No. 2 computing unit identifier to the first message No. 2, and No. 3 computing unit identifier to the first message No. 3, so that the distance computing module 203 knows that the data corresponding to the first message No. 1 is processed by the computing unit No. 1, and the data corresponding to the first message No. 1 is not processed by the computing unit No. 5.
Optionally, if the distance calculation module 203 does not process the plurality of first messages, even if the number of the plurality of first messages is less than the number of the calculation units of the distance calculation module 203 and there is a free calculation unit in the distance calculation module 203, the arbitration module 201 does not instruct the distance calculation module 203 to process the messages in the plurality of channels.
Specifically, in order to ensure the order of processing requests in multiple channels, the arbitration module 201 does not continue to acquire the first messages in other channels to allow the idle computing units to process even if it acquires the first messages less than the number of computing units in the distance computing module 203.
Optionally, after acquiring the plurality of first messages, the arbitration module 201 aggregates the plurality of first messages, and instructs at least one computing unit in the distance computing module 203 to process the aggregated plurality of first messages.
It should be noted that each computing unit in the distance computing module 203 has a computing capability of processing a set of key values, and if each first message corresponds to only one key value, the computing capability of the computing unit is wasted, so the arbitration module 201 obtains the number of key values corresponding to each first message, and combines a plurality of first messages according to the computing capability of the computing unit, thereby fully utilizing the computing capability of the distance computing module 203.
Optionally, referring to fig. 2, the accelerator 200 further includes a result processing module 205, and after the distance calculation module 203 obtains the calculation result, the result processing module 205 merges the calculation result of each calculation unit and sends the merged calculation result to the DDR.
Fig. 4 is a schematic diagram of a result processing module merging calculation results in the embodiment of the present application. As shown in fig. 4, the distance calculation module 203 includes 8 calculation units, the result processing module 205 combines 8 calculation results 401 to 408 into a result queue 409 according to the positions of the calculation units, and sends the data in the result queue 409 to the external memory according to the format of the result queue 409.
Referring to fig. 5A, a second embodiment of the present application provides an information processing system 500, which includes a host 501, a solid state disk 502, an accelerator 503, a mapping manager 504, and a processor 505. Wherein the accelerator 503 is an accelerator for cluster calculation such as the above embodiment; the solid state disk 502 is a solid state disk such as nvmesd, ocsd, kvsd, or the like; the host 501 is a computer and the processor 505 is an FPGA chip. The host 501 is coupled to the processor 505 directly or through a network.
The host is coupled to an electronic device such as a camera, switch, server, etc.
For example, an application is running on the host, the application generates a request, the request type includes a read request (Get, Get _ Index, or Get _ With _ Index) and a write request (Put or Put _ With _ Index), and the application generates a first request. By way of example, an application calls the get (Key) API to generate a read request to perform a query on a Key (Key) and expects a set of characteristic values associated with the Key (Key) or further data associated with the set of characteristic values. Specifically, the Key (Key) is from a feature value generated from a picture input by a user, and the application calls the get (Key) API to expect a group of pictures similar to the picture input by the user.
As yet another example, an application program performs a query on a Key (Key) and an Index (Index) to call the Get (Key, Index) API to generate a read request. The keyword (Key) comes from a feature value generated from a picture input by a user, and the Index (Index) indicates a category (e.g., a face or a license plate) of a search picture specified by the user. The application calls the Get (Key, Index) API to Get a set of pictures similar to the picture input by the user in the category specified by the Index (Index).
As yet another example, an application program performs a query on a Key (Key) to call the Get _ Index (Key) API to generate a read request. The application calls the Get _ index (key) API to expect the picture category associated with the picture input by the user.
Optionally, the first request includes a plurality of key values and a plurality of first index values.
In one embodiment, an application running on host 501 calls Get/PutAPI to generate a first request, which is sent to processor 505 for processing. The processor 505 generates one or more of a clustering request, a search request, an address assignment request, a request to obtain an address to access a storage device according to the first request, and operates one or more of the accelerator 503, the solid state drive 502, and the mapping manager 504 to complete processing of the first request and return the result to the host 501.
In yet another example, an application running on host 501 calls Get/PutAPI to generate a first request, and a driver or hypervisor running on host 501 generates a cluster request, a search request, and/or a request to access a storage device based on the first request and operates one or more of accelerator 503, solid state drive 502, and mapping manager 504 via processor 505. The results of the accelerator 503, the solid state drive 502, and the mapping manager 504 in processing the request are provided to the driver of the host 501, and the driver or the hypervisor provides the processing results to the application after obtaining the processing results of the first request.
Referring to fig. 5B, a flow chart of the information processing system 500 for processing the Get _ index (key) request is shown in fig. 5B. An application running on host 501 sends a Get _ index (key) request by calling the API, and a driver or hypervisor running on host 501 generates a clustering request sending processor 505 in response to the call to Get _ index (key). Such that processor 505 receives the clustering request. The Key (Key) is carried in the clustering request. Optionally, the host 501 also adds X index values or addresses indicating X index values to the request provided to the processor 505.
The processor 505 sends a clustering request to the accelerator 503 after the clustering request. The clustering request indicates a Key (Key) and X index values. Optionally, the processor 505 appends X index values or addresses indicating X index values to the cluster request. The accelerator 503 operates in the manner described in the above embodiments, compares the Key (Key) with the distances of the X index values, and outputs the index value (first index value) having the smallest distance from the Key (Key) to the processor 505.
Optionally, after the accelerator 503 writes the first index value to an external memory (not shown), the accelerator 503 may also send first completion information to the processor 505 for characterizing completion of the computation.
The processor 505, after receiving the first completion information, sends the first completion information and the first index value to the host 501. The driver or hypervisor of host 501 provides the application calling the API with the first index value as a result of processing the Get _ index (key) request.
Optionally, if the application of the host 501 calls a get (key) or put (key) API (not shown), the driver or hypervisor of the host 501 also generates a clustering request in response and sends the clustering request to the processor 505. The clustering request carries a Key (Key) and X index values. The processor 505 obtains a Key (Key) indicated by the clustering request. The processor 505 also obtains X index values and sends a clustering request to the accelerator 503, the clustering request indicating the input Key value (Key) and the X index values.
In response to the clustering request, the accelerator 503 provides the calculation result to the processor 505 indicating the first index value corresponding to the key value. Next, the processor 505 provides the key value and the N feature values corresponding to the first index value to the accelerator 503 again to instruct the accelerator 503 to perform the function of searching for similar data. The calculation results provided by accelerator 503 indicate M feature values corresponding to the same key value as the first index value. The processor 505 provides the M feature values obtained from the accelerator 503 to the host 501 as a response to the clustering request.
In an alternative embodiment, the operations of the driver or hypervisor of the host 501 shown in FIG. 5B are implemented by the processor 505. Thereby reducing interaction between the processor 505 and the host 501 and improving processing efficiency.
Fig. 5C is a flowchart illustrating the information processing system 500 obtaining the feature value according to the index value. When processing a Get (Key), Get _ With _ Index (Key, Index), Put (Key), or Put _ With _ Index (Key, Index) request, N feature values corresponding to the Index (Index) need to be obtained. Under the direction of a driver or a management program of the host 501, the processor 505 accesses the mapping manager 504 to acquire an address storing N characteristic values from the Index (Index), and accesses the solid state disk 502 to acquire N characteristic values from the acquired address.
After the host 501 sends a get (Key) or put (Key) request, the processor 505 first sends a clustering request to the accelerator 503 (see fig. 5B) to obtain an Index value (Index) corresponding to the Key (Key). If the host 501 sends a Get _ With _ Index (Key, Index) or Put _ With _ Index (Key, Index) request, the Index value (Index) is obtained from the parameters of the Get _ With _ Index (Key, Index) or Put _ With _ Index (Key, Index) request.
Referring to FIG. 5C, the driver or hypervisor of host 501 sends the Index value (Index) to processor 505. The driver or hypervisor of host 501 also indicates to processor 505 the operations to be performed by mapping manager 504 when sending the Index value (Index) to processor 505, depending on whether the application of host 501 previously issued a read request or a write request. The processor 505 sends the index value to the mapping manager 504.
After the mapping management 504 receives the Index value, if the request sent by the application program of the host is a write request, such as Put (Key) or Put _ With _ Index (Key, Index), a storage address is allocated to the input Key value according to the Index value (Index), and the storage address is used for accessing the solid state disk 502. Mapping management 504 provides the allocated memory address to processor 505. If the request sent by the host is a read request, such as Get (Key) or Get _ With _ Index (Key, Index), after receiving the Index value (Index), the mapping management module 504 obtains the storage addresses of all feature values corresponding to the Index value (Index), for example, obtains N storage addresses of N feature values in the category represented by the Index value (Index), and sends the storage addresses of all feature values corresponding to the Index value (Index) to the processor 505. The N storage addresses of the N characteristic addresses are N discrete storage addresses or N storage addresses which are continuous together. When the N memory addresses are consecutive, the memory addresses of the N characteristic values may be represented by one memory address, and only this one memory address needs to be acquired.
The processor 505 sends the memory address returned by the mapping manager 504 to the host 501. The driver or the hypervisor of the host 501 generates a request for accessing the solid state disk 502 according to the received storage address, and sends the request together with the storage address to the processor 505. If the application of the host 501 previously issued a read request, such as Get (Key) or Get _ With _ Index (Key, Index), the host 501 generates a request to read data from the solid state disk 502.
Processor 505 accesses solid state disk 502 with the memory address and an indication of a read request or a write request.
The solid state disk 502 writes data to or reads data from the received memory address in accordance with the received request. As an example, the data read from the solid state disk is N characteristic values corresponding to the Index value (Index); the data written into the solid state disk is a characteristic value to be written by an application program through calling Put (Key) or Put _ With _ Index (Key, Index).
For a request to read data, solid state disk 502 sends the read data (N characteristic values) to processor 505. The processor 505 transmits the read data to the host 501. The driver of the host 501 provides read data to an application program calling a Get (Key) or Get _ With _ Index API according to the received message indicating that reading of data is completed and indicates that API calling is completed. At this point, the processing of the Get (Key) or Get _ With _ Index (Key, Index) previously issued by the host is completed.
For a request to write data, solid state disk 502 sends a message to processor 505 indicating that the write data processing is complete. The processor 505 sends a message to the host 501 (not shown) indicating that the write data is complete. The driver of the host 501 indicates completion of the call to the application that calls the Put (Key) or Put _ With _ Index API according to the received message indicating completion of the write data. Up to this point, the processing of Put (Key) or Put _ With _ Index (Key, Index) previously issued by the host is completed.
In an alternative example, if the application of the host 501 calls put (key) or get (key), the process in fig. 5B is first executed to obtain an Index value (Index), and after the host 501 receives the Index value, the process in fig. 5C continues.
In an alternative embodiment, the operations of the driver or hypervisor of the host 501 shown in FIG. 5C are implemented by the processor 505. Thereby reducing interaction between the processor 505 and the host 501 and improving processing efficiency.
Referring to fig. 5D, host 501 sends a search request to processor 505.
If the request sent by the host is a read request, such as get (Key), the host 501 sends a clustering request to the accelerator 503 through the processor 505, the accelerator 503 returns an Index value (Index) corresponding to a Key (Key) (see also fig. 5B), the host 501 obtains N feature values (Vector) corresponding to the Index value (Index) from the mapping manager 504 and the solid state disk 502 through the processor 505 (see also fig. 5C), and then, as shown in fig. 5D, the host 501 sends a search request to the accelerator 503 through the processor 505. The search request carries a Key (Key) and N feature values. The host 501 expects the processor 505 to return the M index values that are closest to the Key (Key) in the N feature values for the search request. If the host is a read request issued by calling Get _ With _ Index (Key, Index), the host 501 first obtains N feature values (Vector) corresponding to the Index value (Index) from the mapping manager 504 and the solid state disk 502 through the processor 505 (see also fig. 5C), and then, as shown in fig. 5D, the host 501 issues a search request to the accelerator 503 through the processor 505.
Referring to fig. 5D, host 501 then sends a search request carrying a Key (Key) and N feature values to processor 505.
After receiving the search request, the processor 505 sends a search request to the accelerator 503, so that the accelerator 503 works in the manner described in the above embodiment to search out M feature values closer to the Key (Key) from among the N feature values.
The accelerator 503 writes the calculation results of the M feature values into the external memory, and the accelerator 503 transmits completion information for characterizing the completion of the search processing to the processor 505.
The processor 505, upon receiving the completion information, sends the completion information and the calculation result to the host 501. So far, the application program of the host 501 calls Get (Key) or Get _ With _ Index (Key, Index) to be processed, and M feature values are obtained as a result.
In an alternative embodiment, the operations of the driver or hypervisor of the host 501 shown in FIG. 5D are implemented by the processor 505. Thereby reducing interaction between the processor 505 and the host 501 and improving processing efficiency.
As an example, fig. 5E shows a flowchart of processing put (key) request by the information processing system 500.
As shown in fig. 5E, in response to an application of the host 501 calling the put (key) API, a driver or hypervisor run by the host 501 generates a clustering request to send to the processor 505. The clustering request carries a Key (Key) and X index values.
After receiving the clustering request, the processor 505 sends the clustering request to the accelerator 503, so that the accelerator 503 works in the manner described in the above embodiments, and obtains an index value corresponding to a Key (Key) of the clustering request.
The accelerator 503 stores the index value in an external memory and sends first completion information characterizing the completion of the clustering operation to the processor 505.
After receiving the first completion information, the processor 505 sends the first completion information and an index value corresponding to a Key (Key) of the classmark request to the host 501. It is understood that the processor 505 may send the data packet carrying the first completion information and/or the index value to the host 501, or may indicate an interrupt to the host 501, and the host 501 actively obtains the first completion information and/or the index value.
After receiving the first completion information and the index value corresponding to the Key (Key) of the request for the same cluster, the driver or the hypervisor of the host 501 sends an address allocation request to the processor 505 according to the put (Key) request to be processed, where the address allocation request carries the index value corresponding to the Key (Key) of the Key (Key) and the Key (Key) of the request for the same cluster.
The processor 505 sends an allocate address request to the mapping management module 504.
The mapping management module 504 allocates a storage address, which is an address in the solid state disk 502, to the input Key (Key) according to the received index value. Then, the mapping management module 504 sends the storage address and the second completion information to the processor 505.
The processor 505 sends the memory address and the second completion information to the host 501.
After receiving the second completion information and the memory address, the driver or the hypervisor of the host 501 sends the memory address and the data to be written to the processor 505 according to the put (key) request to be processed. The write data is, for example, a Key (Key) indicated when the put (Key) API is called and/or data corresponding to the Key (Key).
Processor 505 sends the memory address and the data to be written to solid state disk 502.
According to the storage address, the solid state disk 502 writes the data to be written and/or the keyword (Key) into the storage space indicated by the storage address, and sends third completion information to the host 501 through the processor 505.
The driver or hypervisor of the host 501, in response to receiving the third completion message, notifies the application that its called put (key) API processing is complete.
In an alternative embodiment, the operations of the driver or hypervisor of the host 501 shown in FIG. 5E are implemented by the processor 505. Thereby reducing interaction between the processor 505 and the host 501 and improving processing efficiency.
As yet another example, fig. 5F shows a flow chart of information processing system 500 processing a get (key) request. As shown in fig. 5F, the application of the host 501 calls the get (key) API. The driver of the host 501 sends a clustering request to the processor 505 in response to the get (key) API being called. The clustering request indicates a Key (Key) and X index values.
After receiving the clustering request, the processor 505 sends the clustering request to the accelerator 503, so that the accelerator 503 completes the clustering operation in the manner described in the above embodiment, and obtains an index value corresponding to a Key (Key) of the clustering request.
The accelerator 503 writes the index value to the external storage device, and the accelerator 503 also sends first completion information for characterizing the completion of the clustering operation to the processor 505.
The processor 505, upon receiving the first completion information, sends the first completion information and the index value to the host 501.
The driver or hypervisor of the host 501 sends an address obtaining request to the processor 505 according to the get (key) API called by the application in response to receiving the first completion information and the index value. The address fetch request indicates a Key (Key) and an index value.
The processor 505 sends an address fetch request to the mapping management module 504.
The mapping management module 504 obtains one or more storage addresses of the N feature values corresponding to the index value according to the index value indicated by the received address obtaining request, where the storage addresses are addresses in the solid state disk 502. The mapping management module 504 then sends the one or more memory addresses and the second completion information to the processor 505.
The processor 505 sends one or more memory addresses and second completion information to the host 501.
In response to receiving the second completion information and the one or more memory addresses, the driver of the host 501 sends the one or more memory addresses to the processor 505 according to the get (key) API called by the application program to instruct the processor 505 to read the N characteristic values from the solid state disk 502.
Processor 505 sends a data read request and one or more memory addresses to solid state disk 502.
In response, the solid state disk 502 reads N characteristic values corresponding to the one or more storage addresses from itself according to the one or more storage addresses, and sends third completion information and the N characteristic values to the host 501 through the processor 505.
The driver of the host 501, in response to receiving the third completion information and the N feature values, sends a search request to the processor 505 according to the invoked get (key) API. The search request indicates a Key (Key) and N feature values.
After receiving the search request, the processor 505 sends the search request to the accelerator 503.
After receiving the search request, the accelerator 503 operates in the manner described in the embodiments of the present application, and searches for M feature values closest to a Key (Key) from the N feature values.
The accelerator 503 writes the M feature values to the external memory, and the accelerator 503 transmits fourth completion information for characterizing completion of the search operation to the processor 505.
The processor 505, after receiving the fourth completion information, sends the fourth completion information and/or the M characteristic values to the host 501. The driver or hypervisor of the host 501, in response to receiving the fourth completion information and/or the M feature values, notifies the application that the get (key) API processing it called is complete.
In an alternative embodiment, the operations of the driver or hypervisor of the host 501 shown in FIG. 5F are implemented by the processor 505. Thereby reducing interaction between the processor 505 and the host 501 and improving processing efficiency.
Referring to fig. 6, yet another embodiment of the present application provides a host 600.
The host 600 includes an application module 601, an API interface module 605, a management module 610, an acceleration driving module 620, a mapping driving module 630, and a hard disk driving module 640.
The application module 601 running in the host 600 calls an API provided by the API interface module 605. API provided by API interface module 605 includes, for example, Get (Key), Get _ Index (Key), Get _ With _ Index (Key, Index), Put (Key), Put _ With _ Index (Key, Index), and the like. The API interface 605 processes the called API through the management module 610 in response to the API being called.
The management module 610 sends a request to one or more of the acceleration driving module 620, the mapping driving module 630, and the hard disk driving module 640 according to the called API, and obtains a result for responding to the called API and provides the result to the API interface module 605 according to a result given by the acceleration driving module 620, the mapping driving module 630, and/or the hard disk driving module 640 for the request. The API interface module 605 provides the result to the application module 601 that called the API.
The acceleration driver module 620 manages and operates an accelerator (e.g., accelerator 503 shown in fig. 5A) according to an embodiment of the present application as a driver; the mapping driver module 630 serves as a driver to manage and operate the mapping manager 504 according to the embodiment of the present application; the hard disk drive module 640 serves as a driver to manage and operate the solid state disk 502 according to the embodiment of the present application.
As an example, if the application module 601 calls put (key) API, the management module 610 first sends a clustering request to the overdrive module 620. The clustering request indicates the Key (Key) of the put (Key) API. In response to the overdrive module 620 providing the result of the clustering request (indicating an index value corresponding to a Key), the management module 610 also sends an address fetch request (indicating an index value and a Key) to the map driver module 630. In response to the map driver module 630 providing the result (one or more addresses) of the address fetch request, the management module 610 sends the storage address and the data to be written to the hard disk drive module 640. The hard disk drive module 640 provides the management module 610 with a specified storage address for data writing to the solid state disk and indicates to the management module 610 that the data writing is complete, and the management module 610 indicates to the application module 601 through the API interface 605 that the call to the put (key) API is complete.
As yet another example, the application module 601 calls the get (key) API, and the management module 610 first sends a clustering request to the overdrive module 620. The clustering request indicates the Key (Key) of the get (Key) API. In response to the overdrive module 620 providing the result of the clustering request (indicating an index value corresponding to a Key), the management module 610 also sends an address fetch request (indicating an index value and a Key) to the map driver module 630. In response to the map driver module 630 providing the result (one or more addresses) of the address fetch request, the management module 610 sends the storage address to the hard disk drive module 640. The hard disk drive module 640 reads data from the storage address of the solid state disk, where the read data are N characteristic values corresponding to the index value. The hard disk drive module 640 supplies the read N characteristic values to the management module 610. The management module sends a search request (indicating a Key and N feature values) to the overdrive module 620 in response. The accelerator driver module 620 operates the accelerator to provide the M of the N feature values that are closest to the Key (Key) to the supervisor module 610. Management module 610 provides M feature values to application module 601 through API interface 605 and indicates that the call to the get (key) API is complete.
As yet another example, the application module 601 calls the Put _ With _ Index API. The management module 610 sends an address acquisition request (indicating an index value and a Key (Key)) to the map driver module 630. In response to the map driver module 630 providing the result (one or more addresses) of the address fetch request, the management module 610 sends the storage address to the hard disk drive module 640. The hard disk drive module 640 reads data from the storage address of the solid state disk, where the read data are N characteristic values corresponding to the index value. The hard disk drive module 640 supplies the read N characteristic values to the management module 610. The management module sends a search request (indicating a Key and N feature values) to the overdrive module 620 in response. The accelerator driver module 620 operates the accelerator to provide the M of the N feature values that are closest to the Key (Key) to the supervisor module 610. The management module 610 provides M feature values to the application module 601 through the API interface 605 and indicates that a call to the Get _ With _ Index API is completed.
As yet another example, if application module 601 calls the Put _ With _ Index API, management module 610 sends an address get request (indicating an Index value and a Key (Key)) to map driver module 630. In response to the map driver module 630 providing the result (one or more addresses) of the address fetch request, the management module 610 sends the storage address and the data to be written to the hard disk drive module 640. The hard disk drive module 640 provides the management module 610 With a specified storage address for writing data into the solid state disk, and indicates to the management module 610 that the data writing is completed, and the management module 610 indicates to the application module 601 through the API interface 605 that the call to the Put _ With _ index (key) API is completed.
As still another example, application module 601 calls the Get _ index (key) API. The management module 610 first sends a clustering request to the acceleration driver module 620. The clustering request indicates the Key (Key) of the Get _ Index (Key) API. In response to the overdrive module 620 providing the result of the clustering request (indicating an index value corresponding to a Key), the management module 610 provides the index value to the application module 601 through the API interface 605 and indicates that the call to the Get _ index (Key) API is complete.
Although the present application has been described with reference to examples, which are intended to be illustrative only and not to be limiting of the application, changes, additions and/or deletions may be made to the embodiments without departing from the scope of the application.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An accelerator for cluster computation, comprising an arbitration module, a data handling module, and a distance computation module, wherein,
the arbitration module sends a second request to the data carrying module according to a received first request, wherein the first request carries a keyword, or the first request carries the keyword and a first index value corresponding to the keyword, the first request is a read request (Get) or a write request (Put), and the second request is a data carrying request;
when the first request carries the keyword, the data carrying module carries the keyword and X index values from the DDR to the distance calculation module according to the second request, wherein X is an integer greater than or equal to 2;
and the distance calculation module performs distance calculation on the keyword according to the X index values to obtain the first index value, wherein the first index value is the index value with the minimum distance from the keyword in the X index values.
2. The accelerator of claim 1, wherein when the first request carries the key and the first index value,
the data carrying module carries the keywords and the N characteristic values corresponding to the first index value to the distance calculation module from the DDR according to the second request;
the distance calculation module performs distance calculation on the keyword according to the N characteristic values to acquire M first characteristic values with the minimum distance from the keyword from the N characteristic values, wherein N and M are integers which are greater than or equal to 1, and M is smaller than or equal to N.
3. The accelerator of claim 1 or 2, further comprising a normalization module that, prior to the distance calculation module receiving the key and the X index values,
the normalization module performs normalization operation on the keyword and the X index values, and sends the normalized keyword and the X index values to the distance calculation module.
4. The accelerator of claim 2 or 3, wherein when the first request carries the key and the first index value,
the distance calculation module calculates the distance between each characteristic value in the N characteristic values and the keyword to obtain N first distances;
the distance calculation module sorts the N first distances from small to large according to values to obtain a sorted distance queue;
the distance calculation module selects the M first characteristic values from the distance queue.
5. The accelerator of any one of claims 1-4, wherein when the distance calculation module performs distance calculation on the key, the distance calculation module calculates a distance between each of the X index values and the key to obtain X second distances;
the distance calculation module compares the magnitudes of any two second distances among the X second distances, and determines the index value of the smallest second distance among the X index values as the first index value.
6. The accelerator of any one of claims 1-5, wherein the distance computation module comprises Y computation units, the key comprises Z sets of values, Y and Z are each integers greater than or equal to 1, and Z is less than or equal to Y,
and the distance calculation module performs distance calculation on the Z groups of numerical values simultaneously according to the X index values by utilizing the Y calculation units to obtain the index value corresponding to each numerical value in the Z groups of numerical values.
7. The accelerator of any one of claims 1-6, wherein the first request further indicates a channel used by the accelerator to process the first request, the arbitration module to instruct at least one computation unit of the distance computation module to process the first request according to the channel indicated by the first request.
8. The accelerator of claim 7, wherein the arbitration module assigns a compute unit identification to each of the plurality of first messages, the compute unit identification to instruct a first compute unit in the distance computation module to process a first message carrying the first compute unit identification, the first compute unit identification corresponding to the first compute unit.
9. The accelerator according to claim 7 or 8,
if the distance calculation module does not finish processing the first messages, the arbitration module does not instruct the distance calculation module to process the messages in the channels even if the number of the first messages is smaller than the number of the calculation units in the distance calculation module and the distance calculation module has a free calculation unit.
10. The accelerator according to any one of claims 1 to 9, wherein the distance calculation module further comprises Y management units, each management unit being coupled to one of the Y calculation units, the arbitration module assigning a calculation unit identifier to the Z sets of values when sending the second request, each management unit obtaining values from the Z sets of values for which the calculation unit identifiers are the same as itself, each management unit sending the obtained values to the calculation unit coupled to itself.
CN201910874345.2A 2019-08-30 2019-09-17 Accelerator for cluster computation Active CN111581441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210095018.9A CN114461861A (en) 2019-08-30 2019-09-17 Accelerator for cluster computation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2019108188990 2019-08-30
CN201910818899 2019-08-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202210095018.9A Division CN114461861A (en) 2019-08-30 2019-09-17 Accelerator for cluster computation

Publications (2)

Publication Number Publication Date
CN111581441A true CN111581441A (en) 2020-08-25
CN111581441B CN111581441B (en) 2022-06-17

Family

ID=72113264

Family Applications (4)

Application Number Title Priority Date Filing Date
CN201910874351.8A Active CN111580742B (en) 2019-08-30 2019-09-17 Method for processing read (Get)/Put request using accelerator and information processing system thereof
CN201910874345.2A Active CN111581441B (en) 2019-08-30 2019-09-17 Accelerator for cluster computation
CN202110565253.3A Pending CN113138724A (en) 2019-08-30 2019-09-17 Method for processing read (Get)/Put request using accelerator and information processing system thereof
CN202210095018.9A Pending CN114461861A (en) 2019-08-30 2019-09-17 Accelerator for cluster computation

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910874351.8A Active CN111580742B (en) 2019-08-30 2019-09-17 Method for processing read (Get)/Put request using accelerator and information processing system thereof

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN202110565253.3A Pending CN113138724A (en) 2019-08-30 2019-09-17 Method for processing read (Get)/Put request using accelerator and information processing system thereof
CN202210095018.9A Pending CN114461861A (en) 2019-08-30 2019-09-17 Accelerator for cluster computation

Country Status (1)

Country Link
CN (4) CN111580742B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211341A (en) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 Image intelligent mode recognition and searching method
CN106383695A (en) * 2016-09-14 2017-02-08 中国科学技术大学苏州研究院 FPGA-based clustering algorithm acceleration system and design method thereof
CN110275838A (en) * 2018-03-16 2019-09-24 北京忆芯科技有限公司 The address conversion and its accelerator of KV storage equipment

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9984007B2 (en) * 2014-03-28 2018-05-29 Samsung Electronics Co., Ltd. Storage system and method for performing and authenticating write-protection thereof
US9727485B1 (en) * 2014-11-24 2017-08-08 Pure Storage, Inc. Metadata rewrite and flatten optimization
US10210087B1 (en) * 2015-03-31 2019-02-19 EMC IP Holding Company LLC Reducing index operations in a cache
US9891935B2 (en) * 2015-08-13 2018-02-13 Altera Corporation Application-based dynamic heterogeneous many-core systems and methods
US10248316B1 (en) * 2015-09-30 2019-04-02 EMC IP Holding Company LLC Method to pass application knowledge to a storage array and optimize block level operations
CN107948233B (en) * 2016-10-13 2021-01-08 华为技术有限公司 Method for processing write request or read request, switch and control node
CN111897751A (en) * 2017-01-26 2020-11-06 华为技术有限公司 Data transmission method, device, equipment and system
CN109101185B (en) * 2017-06-20 2023-08-11 北京忆恒创源科技股份有限公司 Solid-state storage device and write command and read command processing method thereof
JP2019079448A (en) * 2017-10-27 2019-05-23 株式会社日立製作所 Storage system and control method thereof
CN110019016A (en) * 2017-12-29 2019-07-16 北京忆恒创源科技有限公司 The KV for providing logic key stores device and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211341A (en) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 Image intelligent mode recognition and searching method
CN106383695A (en) * 2016-09-14 2017-02-08 中国科学技术大学苏州研究院 FPGA-based clustering algorithm acceleration system and design method thereof
CN110275838A (en) * 2018-03-16 2019-09-24 北京忆芯科技有限公司 The address conversion and its accelerator of KV storage equipment

Also Published As

Publication number Publication date
CN111580742A (en) 2020-08-25
CN113138724A (en) 2021-07-20
CN111580742B (en) 2021-06-15
CN114461861A (en) 2022-05-10
CN111581441B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
US20220057940A1 (en) Method and Apparatus for SSD Storage Access
US11397529B2 (en) Method and device for determining strategy for data placement within SSD
US11874815B2 (en) Key-value storage device and method of operating the same
CN110851383B (en) Method and equipment for managing storage system
WO2020154530A1 (en) Low latency swap device, system and method
WO2022007596A1 (en) Image retrieval system, method and apparatus
US20240086113A1 (en) Synchronous write method and device, storage system and electronic device
CN111581247B (en) Data manager, time sequence database and information processing system
CN111580742B (en) Method for processing read (Get)/Put request using accelerator and information processing system thereof
CN115079936A (en) Data writing method and device
CN107688435B (en) IO stream adjusting method and device
CN113721838A (en) Writing and reading data method for storage device, storage controller and DMA engine
CN113126908A (en) Storage device configured to support multi-streaming and method of operating the same
TWI810876B (en) Method and computer program product and apparatus for data access in response to host discard commands
US20240086110A1 (en) Data storage method, storage apparatus and host
WO2022237245A1 (en) Data processing method and apparatus
US20230229493A1 (en) Electronic system, operating method thereof, and operating method of memory device
CN114168084A (en) File merging method, file merging device, electronic equipment and storage medium
KR20240035325A (en) Data storage device operation method, storage apparatus and host
JP2023137488A (en) Storage system and data cache method
CN116644050A (en) Distributed storage method in file VDI environment
CN111367825A (en) Virtual parity data caching for storage devices
CN111159065A (en) Hardware buffer management unit with key word (BMU)
CN116931812A (en) Data access method and storage medium and apparatus for responding to host discard command
CN116955272A (en) File storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant