CN114266302A - Deep learning Embedding data efficient processing system and method for heterogeneous memory device - Google Patents

Deep learning Embedding data efficient processing system and method for heterogeneous memory device Download PDF

Info

Publication number
CN114266302A
CN114266302A CN202111547323.9A CN202111547323A CN114266302A CN 114266302 A CN114266302 A CN 114266302A CN 202111547323 A CN202111547323 A CN 202111547323A CN 114266302 A CN114266302 A CN 114266302A
Authority
CN
China
Prior art keywords
data
embedding
sorting
packing
closed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111547323.9A
Other languages
Chinese (zh)
Inventor
何水兵
陈平
李旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202111547323.9A priority Critical patent/CN114266302A/en
Publication of CN114266302A publication Critical patent/CN114266302A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a system and a method for efficiently processing deep learning Embedding data for heterogeneous memory devices, wherein the system comprises three modules, wherein an Embedding data placement module is used for classifying the Embedding data and pre-adding data and placing the Embedding data and the pre-adding data on an NVM (non-volatile memory) or a DRAM (dynamic random access memory); the efficient index establishing module is used for establishing an index for the placed data; the Embedding operation running module rapidly positions the Embedding data related in the request by using the established index and executes normal Embedding operation. The method utilizes the cold and hot characteristics of deep learning Embedding data and the packing appearance characteristics to carry out data arrangement on heterogeneous memory equipment; and a lightweight index is established to efficiently service task requests, the system can maximally utilize the space of the DRAM and the NVM, and the processing efficiency of the system on Embedding data is improved.

Description

Deep learning Embedding data efficient processing system and method for heterogeneous memory device
Technical Field
The invention relates to the field of deep learning of computer science, in particular to an efficient processing system for deep learning Embedding data.
Background
The innovation of deep learning technology has greatly promoted the development of computer vision, natural language processing, medicine and other fields, and has attracted extensive attention in both academic and industrial fields. In the field of deep learning, Embedding is widely used to represent data due to its powerful characterization capability. At the present stage, because the Embedding data volume is large, a large storage space is required, and from the economical point of view and the device capacity point of view, the existing memory cannot meet the application requirement. With the continuous innovation of memory technology, a nonvolatile memory (NVM) has the characteristics of large capacity, low price, byte addressing capability and high read-write bandwidth (relative to a magnetic disk), and thus, a new solution is provided for the problem of insufficient capacity. However, NVM has poor random read performance, and the access of a large amount of irregular and random embed data drastically reduces the operating efficiency of NVM system (heterogeneous memory system of DRAM + NVM), so how to design an efficient processing system facing deep learning embed data for heterogeneous memory devices to improve the operating efficiency of applications becomes important.
Disclosure of Invention
In order to solve the problem of performance reduction caused by access of a large amount of irregular and random Embedding data, the invention provides an efficient processing system for deep learning Embedding data of heterogeneous memory equipment.
The invention comprises two inventions:
1, intelligently placing Embedding data;
the invention is divided into a training set and a testing set according to a deep learning data set, wherein the training set is known data and used for model training, and the testing set is unknown data and used for model reasoning. And selecting to perform data analysis operation on the training set and perform effect verification by using the test set because the training set and the test set have similar data access modes.
According to the obvious cold and hot distribution characteristics of the Embedding data, the data are firstly sorted according to the access frequency, and the top h% of the data are selected as the hot spot data (the h% can be customized by a user).
The method and the device perform pre-adding and data processing for the frequent item set for storage according to the frequent items of the deep learning Embedding data (a plurality of Embedding data are often accessed together).
2, establishing an efficient index for the placed data;
after the different classes of embed data are placed, the system needs to establish a lightweight index (occupying less memory space overhead) for the data and realize a function of fast searching after a request comes, specifically:
the sorting ID is used as an index of single data, the stored pre-adding and data take a sorting ID set of corresponding Embedding, namely a closed frequent item set, as an index, and simultaneously, the packing relation of each Embedding in the closed frequent items is expressed according to the idea of an adjacency matrix.
Specifically, the technical scheme adopted by the invention is as follows:
a deep learning Embedding data efficient processing method for heterogeneous memory devices comprises the following steps:
(1) and performing access frequency descending sorting on the Embedding data in the training set, setting a sorting ID, and dividing the sorted data into hot data and cold data. The hot data is stored in the DRAM in its entirety and the cold data is stored in the NVM in its entirety. And finally, storing the pre-sum data of the closed frequency complex item set with the packing times larger than 3 into the NVM, and storing the pre-sum data of the closed frequency complex item set with the packing times smaller than or equal to 3 into the DRAM.
(2) And expressing the packing relation of each Embedding in the closed frequent item set according to the idea of the adjacency matrix, wherein each Embedding points to the Embedding connected with the Embedding in the closed frequent item set.
(3) Processing the request: according to the sorting ID, sorting the Embedding data in the request in an ascending order and determining whether each Embedding data in the request belongs to hot data or cold data, wherein:
and (3) for the hot data in the request, judging whether the packing relation exists in the Embedding of the hot data one by one according to the packing relation established in the step (2) and the sorting result, if the packing relation exists in the continuous Embedding and the packing frequency is more than 3, inquiring the NVM according to the sorting ID of the continuous Embedding to obtain the corresponding pre-sum data, and if the packing frequency is less than or equal to 3, inquiring the DRAM according to the sorting ID of the continuous Embedding to obtain the corresponding pre-sum data. And if the packaging relation does not exist, directly inquiring the DRAM to obtain the corresponding Embedding data until the inquiry is complete.
And for the cold data in the request, directly querying the NVM according to the sequencing ID to obtain the corresponding Embedding data until the query is complete.
Further, in the step (1), dividing the front part of the cold data into temperature data, packing and storing all Embedding data in all temperature data into the NVM according to the sorting ID and by using a superset, and recording the largest packing granularity M of the current superset and the sorting ID I of the corresponding largest packed Embedding; when temperature data in the request is processed, inquiring M pre-adding data if the ordering ID is smaller than the Embelling data of I; and for the data with the sorting ID larger than I, the data is queried with the granularity of M-1.
Further, the original data is further divided, including the first h% of hot data, the middle w% of warm data and the last c% of cold data (where h, w and c can be customized), preferably, the hot data accounts for 5%, the warm data accounts for 5% -20%, and the rest is cold data.
Further, the step (2) further includes a step of sorting the closed frequent item set and reassigning the hot data sorting ID according to the closed frequent item set, which specifically includes the following steps:
and performing descending sorting according to the number of the Embedding data in the closed-frequency items, and performing descending sorting between the closed-frequency items with the same number of the Embedding data according to the occurrence probability of the closed-frequency items.
And reallocating the hot data sorting IDs according to the sequence of occurrence of Embedding in the closed frequent item set sorting.
Further, in the step (2), a packing relationship of each Embedding in the closed frequent item set is expressed according to an idea of an adjacency matrix, and each Embedding points to an Embedding connected to the Embedding in the closed frequent item, specifically:
and utilizing a bitmap to represent the packing relation of each Embedding data, wherein the packing relation is represented as 1, the packing relation is not represented as 0, and the packing relation of each Embedding data only records the packing relation between the Embedding data with the sequencing ID larger than that of the Embedding data and the Embedding data.
Further, in the step (3), if the NVM or the DRAM is queried according to the sorting IDs of consecutive embeddings and no corresponding pre-summation data is found, the largest sorting ID in the sorting IDs of consecutive embeddings is removed one by one for re-query until the query obtains the pre-summation data.
The efficient data processing system for deep learning Embedding facing to the heterogeneous memory device, which is based on the method, comprises the following steps:
and the Embedding data placement module is used for performing descending sequencing on the access frequency according to the Embedding data in the training set, setting a sequencing ID (identity), and dividing the sequenced data into hot data and cold data. Storing all hot data into a DRAM and all cold data into an NVM; and meanwhile, performing closed-loop frequent item mining on the hot data to obtain a closed-loop frequent item set, storing the pre-sum data of the closed-loop frequent item set with the packing times larger than 3 into the NVM, and storing the pre-sum data of the closed-loop frequent item set with the packing times smaller than or equal to 3 into the DRAM.
And the efficient index establishing module is used for expressing the packing relation of each Embedding in the closed frequent item set according to the idea of the adjacency matrix, and each Embedding points to the Embedding connected with the closed frequent item set.
And the Embedding operation running module is used for quickly positioning the Embedding data related to the request by utilizing the established index and packing relation and executing normal Embedding operation.
The invention has the beneficial effects that: the invention provides a heterogeneous memory device-oriented deep learning Embedding data efficient processing system, which is used for designing an intelligent data placement mode aiming at heterogeneous memory devices and establishing a lightweight index to efficiently service task requests.
Drawings
FIG. 1 is a block diagram of a system framework;
FIG. 2 is a schematic diagram of an Embedding operation;
FIG. 3 is a CDF diagram of Embedding data access frequency;
FIG. 4 is a schematic diagram of closed-frequency item data (pre-sum data) acquisition;
FIG. 5 is a schematic diagram of Embedding data type division;
FIG. 6 is a schematic diagram comparing NVM single access to DRAM multiple access latency;
FIG. 7 is a schematic diagram of Embedding data placement;
FIG. 8 is a schematic diagram of reallocation of IDs by Embedding data;
FIG. 9 is an index diagram;
FIG. 10 is a schematic diagram of request processing.
Detailed Description
The invention is further illustrated in the following description with reference to specific embodiments according to the accompanying drawings:
the invention provides a heterogeneous memory device-oriented deep learning Embedding data efficient processing method, which is characterized in that data placement is performed on the heterogeneous memory device by utilizing the cold and hot characteristics and the packing appearance characteristics of the deep learning Embedding data; establishing a lightweight high-efficiency index structure; meanwhile, an Embedding processing flow is designed to improve the operation efficiency of the Embedding data.
Based on the method, the invention also provides a deep learning Embedding data high-efficiency processing system facing the heterogeneous memory device, the system frame module diagram is shown in fig. 1 and comprises three modules, wherein an Embedding data placing module is used for classifying the Embedding data and the pre-adding data and placing the Embedding data and the pre-adding data on the NVM or the DRAM; the efficient index establishing module is used for establishing an index for the placed data; the Embedding operation running module rapidly positions the Embedding data related in the request by using the established index and executes normal Embedding operation. The Embedding operation is shown in fig. 2, that is, the system will randomly read a plurality of Embedding data in the storage device and perform an addition operation.
The treatment process of the invention is as follows:
(1) data placement:
as shown in fig. 3, since many embed data have a hot and cold characteristic (part of data is frequently accessed), and these hot data frequently appear together (frequent items), it is possible to perform a hot and cold analysis on training embed data, reasonably place the data in DRAM and NVM, and simultaneously obtain these frequent items of data and perform a pre-summing operation before storing them, which is specifically as follows:
and performing access frequency descending sorting on the Embedding data in the training set, setting a sorting ID, and dividing the sorted data into hot data and cold data. The hot data is stored in the DRAM in its entirety and the cold data is stored in the NVM in its entirety.
And finally, storing the pre-sum data of the closed frequency complex item set with the packing times larger than 3 into the NVM, and storing the pre-sum data of the closed frequency complex item set with the packing times smaller than or equal to 3 into the DRAM.
(2) And index establishment is carried out on the placed data: and expressing the packing relation of each Embedding in the closed frequent item according to the idea of the adjacency matrix, wherein each Embedding points to the Embedding connected with the Embedding in the closed frequent item. And establishing an index of each closed-frequency item according to the Embedding data sorting ID.
(3) When the inference task arrives, carrying out Embedding data search by using the index, and carrying out Embedding operation;
as a preferable scheme, the step (1) can be obtained by dividing into the following sub-steps:
as shown in step one of fig. 4, the system first sorts the data in descending order according to the access frequency, and divides the hot data and the cold data by the user selection threshold.
Step two: and removing cold data in the request.
Step three: requests containing only hot data are passed into a closed-frequency term mining algorithm (MBEA algorithm or MMBEA algorithm).
Step four: and obtaining a plurality of closed frequent item sets, and firstly sorting the closed frequent item sets in a descending order according to the number of data in the frequent items, and sorting the closed frequent items with the same number of data in the descending order according to the occurrence probability.
Further, a part of the cold data may be further selected as the temperature data according to a threshold set by the user, as shown in fig. 5. In order to fully utilize the space of the DRAM device, hot data is stored in the DRAM and warm data and cold data are stored in the NVM. For the pre-adding data of the closed frequent item set, reasonable placement is carried out based on the following observation:
as can be seen from FIG. 6, one NVM access delay is slightly greater than 3 DRAM accesses. In order to enable the total performance of the Embedding operation after the NVM is used to be larger than or equal to that of a device using a full DRAM, only the pre-sum data which is larger than 3 packaging times is put into the NVM, and the pre-sum data which is smaller than or equal to 3 packaging times is put into the DRAM. The cold data is stored in its entirety in the slow NVM device. Since the warm data has the possibility of pre-summing, the super-set is selected to be fully packed (that is, all data are pre-summed from 2 to n until the storage threshold set by the user is reached, and the largest packing granularity M of the current super-set and the number I of the corresponding largest packing Embedding are recorded). In addition, the sequencing ID ranges of the hot data, the warm data and the cold data are required to be recorded, so that subsequent searching is facilitated. The final data placement effect is shown in fig. 7.
As a preferred scheme, the index establishing process in step (2) is specifically as follows:
to facilitate the creation of the index, the Embedding data needs to be reassigned to the hot data sorting IDs in the order of appearance in the sorted frequent items, as shown in fig. 8.
Then, an index is established according to the closed frequent item, and the establishment basis is as follows: and expressing the packing relation of each Embedding in the closed frequent item according to the idea of the adjacency matrix, wherein each Embedding points to the Embedding connected with the Embedding in the closed frequent item. As shown in FIG. 8, the Embedding-1 data is taken as an example. The data connected with the Embedding-1 comprises Embedding-2, Embedding-3 and Embedding-8, so that the connection relation needs to be recorded by using an index. Furthermore, for the packing of multiple data, e.g. 1269, 1 is required to be connected to 2, 2 to 6, 6 to 9. Preferably, to further reduce the index space overhead, it is represented using a bitmap of bit granularity, as shown in fig. 9. Wherein [1] represents the sorting ID. of recording the corresponding Embedding-1, and then the connection relation between the Embedding data with ID larger than 1 and the Embedding-1 is recorded in sequence, wherein the bit corresponding to the bitmap is 1 to represent connection, and 0 to represent disconnection.
Finally, pre-adding the packed data, storing the data by using hash, wherein the pre-adding and the data take a corresponding ordering ID set of Embedding, namely a closed frequent item set, as an index during the storage, for example:
135 is a closed frequent item set, E is summed in advance (E ═ Embedding-1+ Embedding-3+ Embedding-5), and then E is indexed using hash H ("135").
As a preferred scheme, in step (3), when the inference task arrives, the specific process of request processing is as follows:
and mapping the Embedding involved in the new request according to the new sorting ID, and sorting the sorting IDs from small to large.
As shown in FIG. 10, when a new request comes, the ID in the request is known to belong to hot, warm, or cold data. The specific search comprises the following steps:
and for the hot data, judging whether the packing relation exists in the Embedding belonging to the hot data one by one according to the established packing relation and the sorting result, if the packing relation exists in the continuous Embedding and the packing frequency is more than 3, inquiring the NVM according to the sorting ID of the continuous Embedding to obtain the corresponding pre-sum data, and if the packing frequency is less than or equal to 3, inquiring the DRAM according to the sorting ID of the continuous Embedding to obtain the corresponding pre-sum data. And if the packaging relation does not exist, directly inquiring the DRAM to obtain the corresponding Embedding data until the inquiry is complete.
Taking the request of fig. 10 as an example, for 1, it is looked up whether to connect to 2, 2 to 6, 6 to 9. The result when lookup 9 is concatenated with 10 is not concatenated, so the previously concatenated 1269 data is hashed with the sort ID, and since the packing granularity is greater than 3 (lookup on DRAM is less than or equal to 3), the lookup is from NVM. Similarly, the remaining thermal data is looked up in this form. However, if the specified data is not found in the device after the hash occurs, then rolling back an Embedding and then performing the hash, and finding (a special small probability case), for example, finding the hash value of 1234 does not find, then rolling back the hash value of 123.
For temperature data, in this scenario, since all temperature data are packed using a superset, when the largest packing granularity M of the stored current superset is 2, that is, 2 pre-summation operations are stored, 1522, 2833, and 39 are sequentially searched in pairs from the NVM.
Step three: and for cold data, directly searching the NVM one by one according to the sorting IDs until the query is complete.
The following is a specific example to further illustrate the beneficial effects of the present invention:
the specific experiment is as follows:
experimental configuration:
(1) operating the system: ubuntu 18.04.3 LTS;
(2) a CPU: model number 8-core Intel (R) Xeon (R) Gold 6126CPU @2.60GHz, equipped with 32GB DRAM;
(3) a storage device: 512GB, SK hynix SC311 SATA SSD; western Digital WDC WD40EZRZ-75G HDD; intel Optane NVM 256 GB;
the final test results are:
comparison scheme: using a random scheme to store the Embedding data on the DRAM and the NVM without distinction; the scheme of the invention is as follows: dividing data according to cold and hot characteristics and using packaging storage; compared with a comparison scheme, the total access performance of the method disclosed by the invention on the common data set MovieLens is improved by 1.5 times.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should all embodiments be exhaustive. And obvious variations or modifications of the invention may be made without departing from the scope of the invention.

Claims (7)

1. A deep learning Embedding data efficient processing method facing heterogeneous memory devices is characterized by comprising the following steps:
(1) and performing access frequency descending sorting on the Embedding data in the training set, setting a sorting ID, and dividing the sorted data into hot data and cold data. The hot data is stored in the DRAM in its entirety and the cold data is stored in the NVM in its entirety. And finally, storing the pre-sum data of the closed frequency complex item set with the packing times larger than 3 into the NVM, and storing the pre-sum data of the closed frequency complex item set with the packing times smaller than or equal to 3 into the DRAM.
(2) And expressing the packing relation of each Embedding in the closed frequent item according to the idea of the adjacency matrix, wherein each Embedding points to the Embedding connected with the Embedding in the closed frequent item.
(3) Processing the request: according to the sorting ID, sorting the Embedding data in the request in an ascending order and determining whether each Embedding data in the request belongs to hot data or cold data, wherein:
and (3) for the hot data in the request, judging whether the packing relation exists in the Embedding of the hot data one by one according to the packing relation established in the step (2) and the sorting result, if the packing relation exists in the continuous Embedding and the packing frequency is more than 3, inquiring the NVM according to the sorting ID of the continuous Embedding to obtain the corresponding pre-sum data, and if the packing frequency is less than or equal to 3, inquiring the DRAM according to the sorting ID of the continuous Embedding to obtain the corresponding pre-sum data. And if the packaging relation does not exist, directly inquiring the DRAM to obtain the corresponding Embedding data until the inquiry is complete.
And for the cold data in the request, directly querying the NVM according to the sequencing ID to obtain the corresponding Embedding data until the query is complete.
2. The method according to claim 1, wherein in step (1), the front part of the cold data is divided into warm data, all the Embedding data in all the warm data are packed and stored in the NVM according to the sorting ID and by using the superset, and the largest packing granularity M of the current superset and the sorting ID I of the corresponding largest packing Embedding are recorded; when temperature data in the request is processed, inquiring M pre-adding data if the ordering ID is smaller than the Embelling data of I; and for the data with the sorting ID larger than I, the data is queried with the granularity of M-1.
3. The method of claim 2, wherein the hot data percentage is 5%, the warm data percentage is 5% to 20%, and the remainder is cold data.
4. The method according to claim 1, wherein the step (2) further comprises the steps of sorting the closed frequent item set and reassigning hot data sort IDs according to the closed frequent item set, as follows:
and performing descending sorting according to the number of the Embedding data in the closed-frequency items, and performing descending sorting between the closed-frequency items with the same number of the Embedding data according to the occurrence probability of the closed-frequency items.
And reallocating the hot data sorting IDs according to the sequence of occurrence of Embedding in the closed frequent item set sorting.
5. The method according to claim 4, wherein in the step (2), the packing relationship of each Embedding in the closed frequent item set is expressed according to the idea of the adjacency matrix, and each Embedding points to the Embedding connected to it in the closed frequent item, specifically:
and utilizing a bitmap to represent the packing relation of each Embedding data, wherein the packing relation is represented as 1, the packing relation is not represented as 0, and the packing relation of each Embedding data only records the packing relation between the Embedding data with the sequencing ID larger than that of the Embedding data and the Embedding data.
6. The method according to claim 1, wherein in step (3), if the NVM or DRAM is queried according to the consecutive Embedding sorting IDs and no corresponding pre-summation data is found, the largest sorting ID in the consecutive Embedding sorting IDs is removed one by one for re-query until the query obtains the pre-summation data.
7. The system for efficiently processing deep learning Embedding data facing to the heterogeneous memory device according to any one of claims 1 to 6, comprising:
and the Embedding data placement module is used for performing descending sequencing on the access frequency according to the Embedding data in the training set, setting a sequencing ID (identity), and dividing the sequenced data into hot data and cold data. Storing all hot data into a DRAM and all cold data into an NVM; and meanwhile, performing closed-loop frequent item mining on the hot data to obtain a closed-loop frequent item set, storing the pre-sum data of the closed-loop frequent item set with the packing times larger than 3 into the NVM, and storing the pre-sum data of the closed-loop frequent item set with the packing times smaller than or equal to 3 into the DRAM.
And the efficient index establishing module is used for representing the packing relation of each Embedding in the closed frequent items according to the idea of the adjacency matrix, and each Embedding points to the Embedding connected with the Embedding in the closed frequent items.
And the Embedding operation running module is used for quickly positioning the Embedding data related to the request by utilizing the established index and packing relation and executing normal Embedding operation.
CN202111547323.9A 2021-12-16 2021-12-16 Deep learning Embedding data efficient processing system and method for heterogeneous memory device Pending CN114266302A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111547323.9A CN114266302A (en) 2021-12-16 2021-12-16 Deep learning Embedding data efficient processing system and method for heterogeneous memory device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111547323.9A CN114266302A (en) 2021-12-16 2021-12-16 Deep learning Embedding data efficient processing system and method for heterogeneous memory device

Publications (1)

Publication Number Publication Date
CN114266302A true CN114266302A (en) 2022-04-01

Family

ID=80827644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111547323.9A Pending CN114266302A (en) 2021-12-16 2021-12-16 Deep learning Embedding data efficient processing system and method for heterogeneous memory device

Country Status (1)

Country Link
CN (1) CN114266302A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700995A (en) * 2023-08-03 2023-09-05 浪潮电子信息产业股份有限公司 Concurrent access method, device, equipment and storage medium for heterogeneous memory pool

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700995A (en) * 2023-08-03 2023-09-05 浪潮电子信息产业股份有限公司 Concurrent access method, device, equipment and storage medium for heterogeneous memory pool
CN116700995B (en) * 2023-08-03 2023-11-03 浪潮电子信息产业股份有限公司 Concurrent access method, device, equipment and storage medium for heterogeneous memory pool

Similar Documents

Publication Publication Date Title
CN104021161B (en) A kind of clustering storage method and device
WO2013152678A1 (en) Method and device for metadata query
CN102622434B (en) Data storage method, data searching method and device
CN106874348B (en) File storage and index method and device and file reading method
US20040109376A1 (en) Method for detecting logical address of flash memory
WO2020057272A1 (en) Index data storage and retrieval methods and apparatuses, and storage medium
CN107766529B (en) Mass data storage method for sewage treatment industry
CN103473276B (en) Ultra-large type date storage method, distributed data base system and its search method
KR101656750B1 (en) Method and apparatus for archiving and searching database with index information
CN112882663B (en) Random writing method, electronic equipment and storage medium
JP2019512125A (en) Database archiving method and apparatus, archived database search method and apparatus
US20180210907A1 (en) Data management system, data management method, and computer program product
US20110179013A1 (en) Search Log Online Analytic Processing
CN110795363A (en) Hot page prediction method and page scheduling method for storage medium
CN114266302A (en) Deep learning Embedding data efficient processing system and method for heterogeneous memory device
US9627065B2 (en) Memory equipped with information retrieval function, method for using same, device, and information processing method
CN103902693B (en) A kind of method of the memory database T tree index structures for reading optimization
CN111625600B (en) Data storage processing method, system, computer equipment and storage medium
CN110990340B (en) Big data multi-level storage architecture
CN112434085A (en) Roaring Bitmap-based user data statistical method
US7672925B2 (en) Accelerating queries using temporary enumeration representation
Nie et al. Efficient storage support for real-time near-duplicate video retrieval
US9305080B2 (en) Accelerating queries using delayed value projection of enumerated storage
CN110334073A (en) A kind of metadata forecasting method, device, terminal, server and storage medium
CN110297836B (en) User label storage method and retrieval method based on compressed bitmap mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination