CN115080598A

CN115080598A - Memory device and method of querying data in memory device

Info

Publication number: CN115080598A
Application number: CN202210850754.0A
Authority: CN
Inventors: 熊方舟; 余德军; 杨上山; 张钰勃
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Moore Threads Technology Co Ltd
Priority date: 2022-07-20
Filing date: 2022-07-20
Publication date: 2022-09-20

Abstract

A memory device and a method of querying data in the memory device are disclosed. The memory device includes at least one sub-region. Each of the at least one sub-region comprises a first bloom filter and a second bloom filter. The first bloom filter is configured to record deletion of data within the corresponding sub-region, and the second bloom filter is configured to record insertion of data within the corresponding sub-region. According to the memory device provided by the embodiment of the application, the condition that the data is deleted from a certain subarea can be judged, so that the data deletion does not influence the query effect, and the data query speed, efficiency and accuracy are better.

Description

Memory device and method of querying data in memory device

Technical Field

The present application relates to the technical field of data storage and data query, and more particularly, to a memory device and a method of querying data in the memory device.

Background

Rapidity and accuracy are important metrics when querying data in memory.

At present, the database with sequential storage and sequential searching is widely used. In a conventional data query process, each data record in the memory may be queried in a traversal manner. However, this approach is slow and takes too long. This method of querying is particularly disadvantageous for memories that are not read fast enough.

In the related art, one solution is to classify data and store the data using a plurality of databases, respectively. However, the occupation of the storage space is multiplied, and the data amount of different types is difficult to balance, which easily causes the waste of the storage space. It can be said that this scheme trades off memory space for query speed.

In still other approaches, bloom filters may be utilized to increase the speed of data queries. A bloom filter may be used to determine whether certain data is in a certain collection. A bloom filter may be understood as comprising a long array of binary numbers (a vector) and a series of random mapping functions (hash functions). When data is added to a set, the bloom filter may compute a key value for the data using a hash function to obtain a corresponding hash value, and then change the value of the corresponding location (bit) in the binary array (typically changing the initial value 0 to 1) based on the obtained hash value. When querying data, it may be determined whether the data to be queried is in the set by examining the value of the bit corresponding to the key value of the data. If the data does not exist in the set, the query work in the set can be skipped, so that meaningless query is avoided, and the query speed is increased. Advantages of bloom filters are that they take up less space and are more efficient, providing some improvement in saving storage space over the aforementioned data classification schemes. However, the bloom filter may have a false positive. Specifically, if the bloom filter determines that certain data does not exist in a set, then the data must not exist in the set; however, if the bloom filter determines that an element exists in a set, the determination result has a certain probability of being a false determination, because the mapping values obtained by the hash function for different key values of data may be the same. Another disadvantage of the bloom filter is that deletion of data cannot be reflected, because the same bit of the bloom filter may map multiple data, and if the value of the bit is changed due to deletion of one data, the determination result of other data may be affected. This may lead to situations where data that has been deleted from a collection is judged by the bloom filter to still exist within the collection. The more data that is deleted, the lower the accuracy of the bloom filter.

Therefore, in data query, there is still a need to provide a memory and a data query method to obtain higher accuracy while ensuring query speed.

Disclosure of Invention

According to an aspect of the present application, a memory device is provided. The memory device comprises at least one sub-region, wherein each sub-region of the at least one sub-region comprises a first bloom filter configured to record deletion of data within the corresponding sub-region and a second bloom filter configured to record insertion of data within the corresponding sub-region.

In some embodiments, a ratio of space taken up by the first bloom filter and the second bloom filter in total within a corresponding sub-region to storage space of the corresponding sub-region is less than or equal to a space ratio threshold.

In some embodiments, the first bloom filter is configured to, in response to deletion of deleted data from a corresponding sub-region, map a key of the deleted data to a hash value using a hash function of the first bloom filter, and cause a value of a bit corresponding to the hash value in the first bloom filter to be different from an initial value of the bit.

In some embodiments, the second bloom filter is configured to, in response to insertion of inserted data into a corresponding sub-region, map a key of the inserted data to a hash value using a hash function of the second bloom filter and cause a value of a bit within the second bloom filter corresponding to the hash value to be different from an initial value of the bit.

In some embodiments, the initial value of the bit is 1.

In some embodiments, the memory device comprises NOR-type flash memory.

According to another aspect of the present application, a method of querying data in a memory device is provided. The memory device comprises at least one sub-region, each of the at least one sub-region comprising a first bloom filter configured to record deletions of data within the corresponding sub-region, the at least one sub-region comprising at least one queried sub-region, wherein the method comprises: performing a query operation in the queried subregion, wherein the query operation comprises: determining, with a first bloom filter of the queried sub-region, whether data to be queried has a likelihood of having been deleted from the queried sub-region; in response to the data to be queried having a likelihood of having been deleted from the queried subregion, aborting the query operation for the queried subregion, and marking the queried subregion as a query abort subregion.

In some embodiments, determining whether the data to be queried has a likelihood of having been deleted from the queried subregion using the first bloom filter for the queried subregion comprises: calculating the hash value of the keyword of the data to be inquired by using the hash function of the first bloom filter of the inquired subarea to obtain a first group of hash values; determining a value of a bit within a first bloom filter of the queried sub-region corresponding to the first set of hash values; determining that the data to be queried has a likelihood of having been deleted from the queried sub-region in response to the values of the bits all being different from the initial value of the bits.

In some embodiments, the method further comprises: in response to deletion of deleted data from a corresponding sub-region, calculating hash values of keys of the deleted data by using a hash function of a first bloom filter of the corresponding sub-region to obtain a second set of hash values; and causing the value of the bit corresponding to the second set of hash values within the first bloom filter of the corresponding sub-region to be different from the initial value of the bit.

In some embodiments, each sub-region further comprises a second bloom filter configured to record the insertion of data within the corresponding sub-region, wherein the querying operation further comprises: in response to the data to query not having a likelihood of having been deleted from the queried sub-region, determining, with a second bloom filter of the queried sub-region, whether the data to query has a likelihood of being present in the queried sub-region; searching the queried data in the queried subarea in a traversal manner in response to the queried data having the possibility of being present in the queried subarea; in response to the data to be queried not having a likelihood of being present in the queried sub-region, ending the querying operation of the queried sub-region.

In some embodiments, determining whether the data to be queried has a likelihood of being present in the queried subregion using a second bloom filter for the queried subregion comprises: calculating the hash value of the keyword of the data to be inquired by using the hash function of the second bloom filter of the inquired subarea to obtain a third group of hash values; determining a numerical value of a bit corresponding to the third set of hash values within a second bloom filter of the queried sub-region; determining a likelihood that the data to be queried has a presence in the queried subregion in response to the values of the bits all being different from the initial value of the bits.

In some embodiments, the method further comprises: in response to the added data being added to the corresponding sub-area, calculating a hash value of a key of the added data using a hash function of a second bloom filter of the corresponding sub-area to obtain a fourth set of hash values; and making the value of the bit corresponding to the fourth set of hash values in the second bloom filter of the corresponding sub-region different from the initial value of the bit.

In some embodiments, the initial value of the bit is 1.

In some embodiments, the method further comprises: after all the subareas of the memory device execute the query operation, responding to the data to be queried which is not queried yet, and searching the data to be queried in at least one query stopping subarea in a traversal mode.

In some embodiments, the at least one sub-region comprises a first sub-region and a second sub-region, the second sub-region storing no data, the method further comprising: in response to the first bloom filter of the first sub-region indicating that deleted data exists in the first sub-region, moving data in the first sub-region other than the deleted data to the second sub-region; and performing an erase operation on the first sub-region.

Drawings

Embodiments of the present application will now be described in more detail and with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates a block diagram of a memory device according to an embodiment of the present application;

FIG. 2 schematically illustrates a flow diagram of a method of querying data in a memory device according to an embodiment of the present application;

FIG. 3 schematically illustrates a flow chart of a method of querying data in a memory device according to an embodiment of the present application; and is

FIG. 4 schematically shows a flow chart of a method of querying data in a memory device according to an embodiment of the application.

Detailed Description

The technical solutions in the present application will be described clearly and completely with reference to the accompanying drawings in the present application. The described embodiments are only some embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without inventive step, are within the scope of the present application.

According to an aspect of the present application, a memory device is provided. Fig. 1 schematically shows a block diagram of a memory device according to an embodiment of the present application. As shown in FIG. 1, memory device 100 includes at least one sub-region, such as

sub-regions

101, 102, 103, etc. Each of the at least one sub-region comprises a first bloom filter 110 and a second bloom filter 120. The first bloom filter 110 is configured to record the deletion of data within the corresponding sub-area. The second bloom filter 120 is configured to record the insertion of data within the corresponding sub-area. The corresponding sub-region of the first bloom filter 110 may be understood as the sub-region where the first bloom filter 110 is located. The corresponding sub-region of the second bloom filter 120 may be understood as the sub-region where the second bloom filter 120 is located. It can be seen that in the memory device of the present application, two bloom filters are simultaneously arranged in each sub-area of the memory device, and the two bloom filters are respectively used for recording the insertion and deletion of data in the sub-area where the bloom filters are located.

The working principle of these two bloom filters is described below. When data is queried in a memory device, it is necessary to sequentially determine whether data to be queried exists in each of the sub-regions in order. When determining whether the data to be queried exists in a sub-region, the first bloom filter 110 of the sub-region may be first used to determine whether the data to be queried has been deleted from the sub-region. On the one hand, if the first bloom filter 110 determines that there is no situation in which the data to be queried has been deleted from the sub-area, then there must be no situation in which the data to be queried has been deleted from the sub-area. It should be noted that two scenarios may lead to the situation that the data to be queried has already been inserted into the sub-region and is always kept in the sub-region, and that the data to be queried has never been inserted into the sub-region. On the other hand, if the result of the determination of the first bloom filter 110 is that the data to be queried has been deleted from the sub-area, it is possible that the data to be queried has been deleted from the sub-area, but the result may be misjudged. However, by controlling the number of data in a sub-area, the size of the binary array (e.g., the number of bits) of the first bloom filter, and the number of hash functions of the first bloom filter, the false positive rate can be reduced, so that when the data to be queried is deleted from the sub-area as a result of the determination of the first bloom filter 110, there is a high possibility that the data to be queried has indeed been deleted from the sub-area.

Next, it is possible to determine what kind of subsequent operations are performed according to the determination result of the first bloom filter 110. Specifically, if the determination result of the first bloom filter 110 is that the query data has been deleted from the sub-region, the search for the sub-region may be skipped and the sub-region may be marked. If the judgment result of the first bloom filter 110 is that there is no case that the data to be queried has been deleted from the sub-area, then the second bloom filter 120 can be used to judge whether the data to be queried exists in the sub-area.

The second bloom filter may obtain two types of decisions. If the result of the judgment of the second bloom filter 120 is that the data to be queried does not exist in the sub-area, the data to be queried must not exist in the sub-area, in which case, the data search may not be performed in the sub-area. If the result of the determination of the second bloom filter 120 is that the data to be queried exists in the sub-area, it is possible that the data to be queried does exist in the sub-area, and at this time, the data to be queried may be searched in the sub-area in a traversal manner, for example, the keywords of the data to be queried are compared with the keywords of the data in the sub-area one by one. Similarly, when the second bloom filter 120 determines that the data to be queried exists in the sub-area, there is a certain possibility of false determination, but the false determination rate may also be reduced by controlling the number of data in the sub-area, the size of the binary array of the second bloom filter, and the number of hash functions of the second bloom filter.

By arranging two bloom filters which are respectively used for recording data insertion and deletion in corresponding sub-areas in each sub-area of the storage device, the storage device can judge the condition that the data is deleted from a certain sub-area, so that the query effect cannot be influenced by the deletion of the data. After the first bloom filter judges that the data to be queried is deleted from a certain subarea, the search in the subarea can be directly skipped, so that invalid search in the subareas is avoided. And when the judgment result of the first bloom filter is that the condition that the data to be inquired is deleted from the corresponding sub-area does not exist, continuously judging whether the data to be inquired exists in the sub-area by using the second bloom filter. Therefore, the memory device according to the embodiment of the application has faster data query speed and efficiency. Moreover, by providing the first bloom filter for recording data deletion, data that has been deleted from the corresponding sub-area is not considered to still exist in the sub-area, and therefore, the accuracy of data query is also higher.

As mentioned above, by controlling the amount of data in the sub-area and the size of the binary array of the bloom filter, the false positive rate can be reduced. The more the number of data in the subarea is, the higher the false judgment rate is; the larger the size of the binary array, the lower the false positive rate. Therefore, the misjudgment rate of the bloom filter is positively correlated with the number of data in the subarea and negatively correlated with the size of the binary number array.

Since the amount of data in the sub-region depends on the storage space of the sub-region, and the size of the binary array depends on the storage space occupied by the bloom filter, in some embodiments, it may be considered that the misjudgment rate of the bloom filter is positively correlated with the storage space of the sub-region, and negatively correlated with the storage space occupied by the bloom filter. In such an embodiment, the misjudgment rate of the bloom filter can be set within a reasonable range by setting the proportion of the space occupied by the bloom filter in the sub-area to the storage space of the sub-area. For example, when the ratio of the space occupied by the bloom filter in the sub-area to the storage space of the sub-area is greater than or equal to one thousandth of an order (such as 1/2048, 1/1024, etc.), the misjudgment rate of the bloom filter is low. However, it should be noted that although the higher the ratio is, the lower the misjudgment rate of the bloom filter is, if the ratio is too high, the result means that the amount of the stored data corresponding to the bloom filter is small, and at this time, the performance of the bloom filter is excessive, which causes a waste of computing resources and reduces the query efficiency. Therefore, by setting the ratio of the space occupied by the first bloom filter and the second bloom filter in the sub-area and the storage space of the sub-area to be less than or equal to the space ratio threshold value, the balance between the accuracy and the efficiency of the data query can be obtained. The term "spatial proportion threshold" may be understood as follows: when the ratio of the total occupied space of the two bloom filters in the sub-area to the storage space of the sub-area is greater than the threshold, the bloom filters have the capacity of recording a great amount of data insertion and deletion situations, but due to the limitation of the storage space of the sub-area, the data insertion and deletion situations in the sub-area cannot occur so much, and the performance of the bloom filters is seriously wasted; when the ratio of the space totally occupied by the two bloom filters in the sub-area where the two bloom filters are located to the storage space of the sub-area is smaller than or equal to the threshold, the performance of the bloom filters is better matched with the data insertion and deletion conditions which may occur in the corresponding sub-area, and the data insertion and deletion conditions which can be recorded by the bloom filters are adapted to the data insertion and deletion conditions which may actually occur in the sub-area, so that the performance waste is avoided.

In some embodiments, the space proportion threshold may be 1/32, that is, the proportion of the space taken up by the first bloom filter 110 and the second bloom filter 120 in the corresponding sub-region in total to the storage space of the corresponding sub-region is less than or equal to 1/32, for example, the proportion of the two is about 1/32, 1/64, 1/128, 1/256, 1/512, 1/1024, or 1/2048, and so on. The inventors have found that by setting the ratio of the space occupied by the bloom filter to the storage space of the corresponding sub-area within the above range, a balance can be struck between the false positive rate and the efficiency of the bloom filter. In a more specific embodiment, the ratio of the space collectively occupied by the first bloom filter 110 and the second bloom filter 120 within the corresponding sub-region to the storage space of the corresponding sub-region is about 1/64. For example, when the storage space of a sub-region is 4KB, two bloom filters may occupy 64 bytes of space in total.

In order for the first bloom filter 110 and the second bloom filter 120 to function in the data query process, it is assumed that the first bloom filter 110 accurately records the case where data is deleted from a sub-area, and the second bloom filter 120 accurately records the case where data is inserted into a sub-area. The following describes the cases of data deletion and data insertion, respectively.

As shown in fig. 1, when data is deleted from the sub-area, the area where the data is located is not directly erased, but the head of the data is marked that the data is invalid. Then, a hash value of a key (key) of the data is calculated using the hash function of the first bloom filter 110. This process may also be referred to as a mapping process, and the resulting hash value may also be referred to as digest information. In some embodiments, the bloom filter includes a plurality of hash functions. The hash functions map the keywords of the data respectively to obtain a set of hash values. Then, the value of the bit corresponding to the hash value in the first bloom filter 110 is made different from the initial value of the bit. A bit is a binary digit having a value of 0 or 1. Thus, if the initial value of a bit is 0, the value of the corresponding bit of the first bloom filter should change from 0 to 1 when the data is deleted from the sub-region, and vice versa, to represent that the data has been removed from the sub-region. It should be noted that if the bit corresponding to the hash value obtained by mapping the deleted data is not the original value (indicating that there is other data deleted from the sub-area and at least one hash value of the other data is the same as at least one hash value of the data being deleted), the value of the bit does not need to be changed again, i.e., the bit is kept at a value different from the original value.

Similarly, when data is inserted into a sub-region, the key of the inserted data is mapped to a hash value by the hash function of the second bloom filter 120 of the sub-region, and the value of the bit corresponding to the hash value in the second bloom filter 120 is made different from the initial value of the bit to represent that the data is inserted into the sub-region.

In some embodiments, the hash function of the first bloom filter and the hash function of the second bloom filter within the same sub-region may be the same. In this way, the key value of the data to be queried can be mapped only once, so that the flow and the computing resources can be saved. In other embodiments, the hash function of the first bloom filter and the hash function of the second bloom filter within the same sub-region may be different. This can reduce the probability of hash collisions.

In some embodiments, when inserting data into the memory device, sequential writing may be used, i.e., writing in the next sub-region after the previous sub-region is full of data. Therefore, the abrasion of each subarea is balanced, and the space utilization rate is high.

A bloom filter, as a data structure, may be used by any type of storage medium. For example, a memory device according to embodiments of the present application may be a non-volatile memory, which may ensure the security of data when power is unexpectedly turned off. In a more specific embodiment, the memory device according to the embodiment of the present application may be a Flash memory (Flash), which is abbreviated as Flash memory. The flash memory uses a double-grid transistor as a basic storage unit, and a control grid, an oxide layer, a floating grid layer and a tunnel oxide layer are respectively arranged from one side far away from a substrate to one side close to the substrate. And a source electrode and a drain electrode are arranged between the tunnel oxide layer and the substrate side by side. Current can only be conducted in one direction from the source to the drain. The key of the flash memory for recording data is the floating gate layer. A programmed (written) state when the floating gate is fully charged with electrons. When the memory cell in a writing state is read, a conductive channel exists between a source electrode and a drain electrode due to the induction effect of electrons existing in a floating gate, and the bit value read from the drain electrode is binary 0. Electrons in the floating gate are "pulled" out of the memory cell and are in an erased state. When reading a memory cell in an erased state, the bit value read from the drain is a binary 1, since there is no conductive channel between the source and the drain. Before writing data, the flash needs to be initialized, specifically, electrons are derived from all floating gates, that is, all bits are assigned a value of "1". When writing data, it can only rewrite a bit having a value of 1 to 0, but cannot rewrite a bit having already been 0 to 1. Thus, the initial value of a bit of the flash memory is a binary 1. When the memory device of the embodiment of the present application is a flash memory, when data in a precinct is deleted, the value of a bit of a hash value of a key corresponding to the deleted data in the first bloom filter of the precinct is changed from 1 to 0. When data is added to a sub-region, the value of the bit of the hash value of the key corresponding to the added data in the second bloom filter of the sub-region is changed from 1 to 0.

The inventors of the present application have found that, among many types of memory, embodiments of the present application are particularly suitable for NOR-type flash memory. The NOR flash memory has complete data lines and address lines, so that random reading and writing in units of bytes can be realized. In addition, NOR type flash memory has low cost, low system power consumption, and adaptability to harsh environments. In small embedded devices, a memory such as a disk cannot be provided like a large server or a personal computer, and these devices often have a need to manage a certain amount of data, and then a NOR flash memory is a preferred choice. Embodiments of the present application are particularly applicable to NOR-type flash memory because the read principle of NOR-type flash memory is consistent with the operation of bloom filters. Specifically, NOR type flash memory supports only bit changes from 1 to 0, i.e., only unidirectional bit modification. When the bloom filter is used for inquiring data, the NOR type flash memory can randomly read data by bytes, so that the byte where the bit corresponding to the hash value mapped to the key value of the data to be inquired is positioned directly, and the byte is read out and the value of the bit is extracted by shifting and logic operation to inquire the value, therefore, the reading speed is higher, in another aspect, although the NOR type flash memory is traditionally considered to have slower data writing and erasing speeds, it is not suitable for use as a database, but the present application combines a NOR flash memory with a bloom filter, and exerts advantages of each other, enabling the NOR flash memory to have high performance when the NOR flash memory has to be used.

In an embodiment of the present application, the bloom filter may be arranged in units of sectors (sectors). For example, each sub-sector may be a single sector of a NOR type flash memory. When the number of sectors is too large, adjacent sectors may be combined into one logical sector to be handled, and in this case, the sub-area is a combination of a plurality of sectors. In still other embodiments, bloom filters may also be arranged in units of blocks (banks).

In summary, the embodiment of the present application provides a memory device, wherein the memory device includes two bloom filters respectively used for recording data insertion and deletion, so that it is possible to determine that data is deleted from a certain sub-area, and skip the sub-area in which the data to be queried is not located in time, thereby improving the accuracy and speed of data query. In addition, the embodiment of the application further determines the proportion between the space occupied by the bloom filter and the storage space of the sub-area, and can balance the misjudgment rate and the efficiency of the bloom filter. In addition, the application also finds the special matching between the bloom filter and the NOR type flash memory, so that the NOR type flash memory which is originally not suitable for being used as a database can be improved in performance by virtue of the advantages of the bloom filter.

According to another aspect of the present application, a method of querying data in a memory device is provided. The method may be implemented by means of a memory device according to an embodiment of the application.

FIG. 2 schematically shows a flow chart of a method of querying data in a memory device according to an embodiment of the application. In a memory device to which the method is applicable, the memory device comprises at least one sub-region, each of the at least one sub-region comprising a first bloom filter configured to record deletion of data within the corresponding sub-region. The at least one sub-region includes at least one queried sub-region. As shown in fig. 2, the method includes, in step S205, performing a query operation in the queried sub-region. That is, at least one of the sub-regions included in the memory is subjected to a query operation. The query operation comprises:

in step S305, determining whether the data to be queried has a possibility of being deleted from the queried subregion by using the first bloom filter of the queried subregion;

in step S310, in response to the data to be queried having the possibility of being deleted from the queried sub-area, the query operation of the queried sub-area is aborted, and the queried sub-area is marked as a query abort sub-area.

These steps are briefly described below. First, a first bloom filter of a subzone to be queried is used to determine whether data to be queried is likely to have been deleted from the subzone (step S305). If the judgment result is that the data to be queried does not have the possibility of being deleted from the queried subarea, the judgment result is determined, that is, the data to be queried must not be deleted from the queried subarea. If the result of the determination is that the data to be queried has the possibility of being deleted from the sub-area, the result of the determination is not certain, that is, the data to be queried is likely to be deleted from the sub-area, or the data to be queried is likely not to be deleted from the sub-area. However, the certainty of the determination result can be improved by the method described in the foregoing of the present application, so that if the determination result is that there is a possibility that the data to be queried is deleted from the queried sub-area, the possibility that the data to be queried is deleted from the sub-area is higher.

If the result of step S305 is that the data to be queried has the possibility of being deleted from the queried subregion, the query operation of the queried subregion is aborted, and the queried subregion is marked as a query abort subregion. In this step, in view of the possibility that the data to be queried has been deleted from the queried sub-area (especially when the result can be determined relatively by adjusting the parameters of the bloom filter), the data to be queried can be considered not to exist in the sub-area, so that the query for the sub-area can be skipped and the query in another sub-area can be started as early as possible, so as to improve the query speed.

FIG. 3 schematically shows a flow chart of a method of querying data in a memory device according to an embodiment of the application. In some embodiments, a second bloom filter is also included in each sub-section of the memory device. The second bloom filter is configured to record insertions of data within corresponding sub-regions. The query operation further comprises:

in step S315, in response to the data to be queried not having a possibility of being deleted from the queried sub-region, determining whether the data to be queried has a possibility of being present in the queried sub-region using a second bloom filter of the queried sub-region;

in step S320, in response to the data to be queried having the possibility of being present in the queried subarea, searching the queried subarea for the data to be queried in a traversal manner;

in step S325, in response to the data to be queried having no possibility of being present in the queried sub-area, the query operation of the queried sub-area is ended.

These steps are briefly described below. After step S305 is completed, as a result of the aforementioned possibility that the data to be queried has been deleted from the queried sub-area, step S310 is executed to terminate the query operation of the queried sub-area and mark the queried sub-area as a query termination sub-area. Another result is that the data to be queried does not have the possibility of being deleted from the queried subregion. The former result is not certain, but the latter result is certain, that is, if the determination result of step S305 is that the data to be queried does not have the possibility of being deleted from the queried subregion, the data to be queried must not be deleted from the queried subregion. Two situations may result in this determination, the first situation being that the data to be queried has been inserted into the sub-area and is always stored in the sub-area, and the second situation being that the data to be queried has never been inserted into the sub-area. Therefore, a discrimination between these two cases is required. Therefore, in step S315, in response to the data to be queried not having a possibility of being deleted from the queried subregion, it is determined whether the data to be queried has a possibility of being present in the queried subregion using the second bloom filter of the queried subregion. If the result of the determination in step S315 is that the data to be queried has the possibility of being present in the queried sub-area, it is equivalent to the first case of the screening result. At this time, the data to be queried may be searched in the queried sub-area in a traversal manner, for example, all data key values in the sub-area are traversed, so as to expect to find a data key value corresponding to the key value of the data to be queried. However, the result of this determination in step S315 is not certain, i.e., it is possible that the data to be queried does not exist in the queried subregion. Similarly, the certainty of the result can be made higher by improving the aforementioned parameters of the second bloom filter, so that when the judgment result of the second bloom filter is that the data to be queried has the possibility of being present in the queried sub-area, the data to be queried has the higher possibility of being present in the queried sub-area. And then, if the data to be queried is found in the queried subarea, taking the data as an end data query process, namely normally ending.

If the determination result in step S315 is that the data to be queried does not have the possibility of being present in the queried sub-region, it is equivalent to that the screening result is the second case. This result is positive, that is, if the determination result of step S315 is that the data to be queried does not have the possibility of being present in the queried sub-area, the data to be queried must not be present in the queried sub-area. At this point, the query operation for the queried sub-region may be ended and the query operation may be caused to proceed in the next queried sub-region.

The steps further judge whether the data to be inquired has the possibility of existing in the inquired subarea or not by using a second bloom filter on the basis of determining that the data to be inquired is not deleted from the subarea, and adopt different subsequent operations according to the judgment result, so that the data to be inquired can be searched in the inquired subarea in a traversal mode only when the data to be inquired has the possibility of existing in the inquired subarea. Therefore, the sub-area which is determined to have no data to be inquired is prevented from being inquired, the inquiry time is shortened, and the data inquiry speed is improved.

The process by which the first bloom filter determines whether the data to be queried has been deleted from the sub-area and the process by which the first bloom filter records that the data has been deleted from the sub-area will be described in more detail below.

In some embodiments, determining whether the data to be queried has a likelihood of having been deleted from the queried subregion using the first bloom filter for the queried subregion comprises: calculating the hash value of the keyword of the data to be inquired by using the hash function of the first bloom filter of the inquired subarea to obtain a first group of hash values; determining a value of a bit within the first bloom filter corresponding to the first set of hash values; and in response to the values of the bits all being different from the initial value, determining that the data to be queried has a likelihood of having been deleted from the queried subregion. In the above steps, after the keywords of the data to be queried are obtained, the keywords of the data to be queried are mapped through the hash functions of the first bloom filter, and each hash function obtains a hash value. Then, the numerical value of the bit of the position corresponding to the hash value in the binary array of the first bloom filter is queried. Then, it is determined whether the values of the bits are all different from their original values. If the values of the bits are all different from their original values, the data to be queried may be considered to have a likelihood of having been deleted from the queried subregion. At this time, according to step S310, the query operation of the queried sub-region may be terminated, and the queried sub-region is marked as a query termination sub-region and assigned with a query termination sub-region number m. If the value of at least one bit is the same as its initial value, the data to be queried is considered to have no likelihood of having been deleted from the queried subregion. At this time, it may be determined whether the data to be queried has a possibility of being present in the queried subregion using a second bloom filter as per step S315.

In order for the first bloom filter to realize the above function, it is necessary to delete its recorded data from the sub-area. The process of deleting the first bloom filter record data from the sub-area may comprise the following steps. First, when a piece of data is deleted from a sub-area, the memory area corresponding to the piece of data is not erased, but the piece of data is marked as invalid at the head of the piece of data. And in response to the data being deleted, computing a hash value of a key of the deleted data using the hash function of the first bloom filter of the sub-area in which the data is located to obtain a second set of hash values. Then, the value of the bits within the first bloom filter of the sub-region corresponding to the second set of hash values is made different from the initial value of the bits. For example, when the memory is a flash memory, the initial value of each bit is a binary 1. When data is deleted from a sub-region, the value of the bit of the first bloom filter of the sub-region corresponding to the hash value of the key of the data should be binary 0.

The process in which the second bloom filter determines whether or not data to be queried exists in the sub-area and the process in which the second bloom filter records that data is added to the sub-area will be described below.

In some embodiments, determining whether the data to be queried has a likelihood of being present in the queried subregion using a second bloom filter for the queried subregion comprises: calculating the hash value of the keyword of the data to be inquired by using the hash function of the second bloom filter of the inquired subarea to obtain a third group of hash values; determining a value of a bit within the second bloom filter corresponding to the third set of hash values; and determining a likelihood that the data to be queried has a presence in the queried subregion in response to the values of the bits all being different from the initial value. In some embodiments, if the first and second bloom filters of a sub-region use the same hash function, the third set of hash values is the same as the first set of hash values, and the first set of hash values may be used directly as the third set of hash values. Then, the numerical value of the bit of the position corresponding to the third set of hash values in the binary array of the second bloom filter is queried. Then, it is determined whether the values of the bits are all different from their original values. If the values of the bits are all different from their initial values, the data to be queried may be considered to have a likelihood of being present in the queried subregion. At this time, the data to be queried may be searched in the queried subarea in a traversal manner according to step S320. If the value of at least one bit is the same as its initial value, the data to be queried is deemed to have no likelihood of being present in the queried subregion. At this time, the query operation of the queried sub-area may be ended according to step S325.

The process of adding the second bloom filter record data to the sub-area may comprise the following steps, similar to the case where the first bloom filter record data is deleted from the sub-area. When added data is added to a subregion, the hash value of the key of the added data is first calculated by using the hash function of the second bloom filter of the subregion to obtain a fourth set of hash values. Then, the value of the bit corresponding to the fourth set of hash values within the second bloom filter of the sub-area is made different from the initial value of the bit. For example, when the memory is a flash memory, the initial value of each bit is a binary 1. When data is added to a sub-region, the value of the bit of the second bloom filter of the sub-region corresponding to the hash value of the key of the data should be binary 0.

FIG. 4 schematically shows a flow chart of a method of querying data in a memory device according to an embodiment of the application. As shown in fig. 4 and described in the foregoing with reference to steps S305 to S325, if the first bloom filter determines that the data to be queried has the possibility of being deleted from the queried sub-area, or if the second bloom filter determines that the data to be queried does not have the possibility of being present in the queried sub-area, the querying operation of the queried sub-area is aborted or ended, and the querying operation is continued in the non-queried sub-area. In addition, if the data to be queried is not found after the data to be queried is searched in the queried sub-area in a traversal manner, the query operation of the queried sub-area is also ended, and the query operation is continued in the non-queried sub-area. In a specific operating procedure, step S205 may be made to proceed in the sub-region numbered n, as shown in fig. 4, for example. If the query operation in that sub-region is aborted or otherwise ended, 1 is added to the value of n to obtain an updated value, which is then taken as the value of n (the process can be represented in the program language by n = n + 1). The next step S205 in the sub-region numbered n is actually the next sub-region, and the process loops until the data to be queried is found or the data to be queried is not found in all the sub-regions.

If the data to be queried is not searched after the query operation is performed in all the sub-regions, the misjudgment may occur in step S305. As mentioned above, if the data to be queried does not have the possibility of being deleted from the queried sub-area as a result of the determination in step S305, the data to be queried must not have been deleted from the queried sub-area, but if the data to be queried has the possibility of being deleted from the queried sub-area as a result of the determination in step S305, the determination is not certain, and in this case, the data to be queried may not be deleted from the queried sub-area as a practical matter. In this case, aborting the query operation of the queried sub-region according to step S310 may result in missed queries.

To compensate for such overlooking, in some embodiments, the method further comprises: after all the subareas of the memory device execute the query operation, in response to the data to be queried not being queried yet, the data to be queried is searched in a traversal manner in at least one query stopping subarea (step S330). If the data to be queried is searched in the query stopping subarea, the data query activity is ended (normally ended) by taking the searched data as a result. If the data to be queried is not searched in one query abort sub-area, the data to be queried is continuously searched in the next query abort sub-area (the process can be represented by m = m +1 in the program language, wherein m is the number of the query abort sub-area). And if the data to be queried is not found after the data to be queried is searched in all query stopping sub-areas in a traversing manner, ending the data query activity (abnormal ending) by taking the data not queried as a result.

It should be understood that a false positive may occur whenever there is a deletion of data within a sub-region. By providing the first bloom filter configured to record data deletions within corresponding sub-regions, the present application can greatly reduce the instances of false positives. However, as mentioned above, the first bloom filter itself may have a misjudgment. In particular, when the amount of data to be deleted is large, the first bloom filter has a high possibility of erroneous judgment. In order to further reduce the false positive rate of the bloom filter, the method according to the embodiment of the present application may also periodically execute a dirty data cleaning policy. In particular, the at least one sub-region of the memory device comprises a first sub-region and a second sub-region, wherein the second sub-region stores no data. The method further comprises the following steps: in response to the first bloom filter of the first sub-region showing that deleted data exists in the first sub-region, moving data in the first sub-region other than the deleted data to the second sub-region; and performing an erase operation on the first sub-region. In the above step, the first subregion, which is the subregion where the deleted data exists, may be first identified by means of the first bloom filter. The data in this sub-area that has not been deleted can then be moved to another empty sub-area, the second sub-area. After the movement is finished, the second sub-area only contains data which is not deleted, the first bloom filter of the second sub-area is empty, that is, the values of all the bits of the first bloom filter are initial values. Therefore, the probability of the second sub-region having a false positive will become small. In addition, an erase operation may be performed on the sub-area (i.e., the first sub-area) where the deleted data exists. In an erase operation, the values of all bits in the sub-area will become the initial values. The erased sub-area becomes in fact a new second sub-area.

When the above method is implemented, a dirty data flush can be performed immediately when only one sub-area of the entire memory is left without data written, so that no situation occurs in which no new sub-area is available. In addition, when the dirty data flush cannot vacate the sub-area where data is not written, the database is full and cannot continue to insert data.

In summary, the method for querying data in the memory device according to the embodiment of the present application can achieve all the advantages of the memory device according to the embodiment of the present application, for example, the situation that data is deleted from a certain sub-area can be judged by means of a bloom filter, and the speed and the accuracy of data query are improved. In addition, according to the method provided by the embodiment of the application, the misjudgment of the first bloom filter is avoided by rechecking the query suspension sub-area, and the query accuracy is improved. In addition, the embodiment of the application also provides a dirty data cleaning strategy, and the query accuracy is further improved.

It should be understood that the above embodiments are described by way of example only. While the embodiments have been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive, and the scope of the application is not limited to the disclosed embodiments.

Terms such as first, second, etc. may be used in this application to describe various devices, elements, components or sections, but are not intended to limit the devices, elements, components or sections in sequence or importance. These terms are only used to distinguish one device, element, component or section from another device, element, component or section.

Other variations to the disclosed embodiments can be understood and effected by those skilled in the art from a study of the drawings, the disclosure, and the appended claims. In this application, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. Features which are listed in mutually different embodiments may be combined without conflict. The order in which the steps of a method according to an embodiment of the present application are recited in the context of the present application should not be construed as limiting the order in which the steps are performed, unless explicitly defined otherwise.

Claims

1. A memory device comprising at least one sub-region, wherein each of the at least one sub-region comprises a first bloom filter configured to record deletions of data within the corresponding sub-region and a second bloom filter configured to record insertions of data within the corresponding sub-region.

2. The memory device of claim 1, wherein a ratio of space taken up by the first bloom filter and the second bloom filter in total within a corresponding sub-region to storage space of the corresponding sub-region is less than or equal to a space ratio threshold.

3. The memory device of claim 1, wherein the first bloom filter is configured to, in response to deleted data being deleted from a corresponding sub-region, map a key of the deleted data to a hash value using a hash function of the first bloom filter and cause a value of a bit corresponding to the hash value within the first bloom filter to be different from an initial value of the bit.

4. The memory device of claim 1, wherein the second bloom filter is configured to, in response to insertion of inserted data into a corresponding sub-region, map a key of the inserted data to a hash value using a hash function of the second bloom filter and cause a value of a bit within the second bloom filter corresponding to the hash value to be different from an initial value of the bit.

5. The memory device of claim 3 or 4, wherein the initial value of the bit is 1.

6. The memory device of claim 1, wherein the memory device comprises a NOR-type flash memory.

7. A method of querying data in a memory device, the memory device comprising at least one sub-region, each of the at least one sub-region comprising a first bloom filter configured to record deletions of data within the corresponding sub-region, the at least one sub-region comprising at least one queried sub-region, wherein the method comprises: performing a query operation in the queried subregion, wherein the query operation comprises:

determining, with a first bloom filter of the queried sub-region, whether data to be queried has a likelihood of having been deleted from the queried sub-region;

in response to the data to be queried having a likelihood of having been deleted from the queried subregion, aborting a query operation for the queried subregion, and marking the queried subregion as a query abort subregion.

8. The method of claim 7, wherein determining whether data to be queried has a likelihood of having been deleted from the queried subregion using a first bloom filter for the queried subregion comprises:

calculating the hash value of the keyword of the data to be inquired by using the hash function of the first bloom filter of the inquired subarea to obtain a first group of hash values;

determining a value of a bit within a first bloom filter of the queried sub-region corresponding to the first set of hash values;

determining that the data to be queried has a likelihood of having been deleted from the queried sub-region in response to the values of the bits all being different from the initial value of the bits.

9. The method of claim 7, wherein the method further comprises:

in response to deletion of deleted data from a corresponding sub-region, calculating hash values of keys of the deleted data by using a hash function of a first bloom filter of the corresponding sub-region to obtain a second set of hash values; and the number of the first and second groups,

causing a value of a bit within the first bloom filter of the corresponding sub-region corresponding to the second set of hash values to be different from an initial value of the bit.

10. The method of claim 7, wherein each sub-region further comprises a second bloom filter configured to record insertions of data within the corresponding sub-region, wherein the query operation further comprises:

in response to the data to query not having a likelihood of having been deleted from the queried sub-region, determining, with a second bloom filter of the queried sub-region, whether the data to query has a likelihood of being present in the queried sub-region;

searching the queried data in the queried subarea in a traversal manner in response to the queried data having the possibility of being present in the queried subarea;

in response to the data to be queried not having a likelihood of being present in the queried sub-region, ending the querying operation of the queried sub-region.

11. The method of claim 10, wherein determining whether the data to query has a likelihood of being present in the queried subregion with a second bloom filter for the queried subregion comprises:

calculating the hash value of the keyword of the data to be inquired by using the hash function of the second bloom filter of the inquired subarea to obtain a third group of hash values;

determining a numerical value of a bit corresponding to the third set of hash values within a second bloom filter of the queried sub-region;

determining a likelihood that the data to be queried has a presence in the queried subregion in response to the values of the bits all being different from the initial value of the bits.

12. The method of claim 10, wherein the method further comprises:

in response to the added data being added to the corresponding sub-area, calculating a hash value of a key of the added data using a hash function of the second bloom filter of the corresponding sub-area to obtain a fourth set of hash values; and the number of the first and second groups,

and making the value of the bit corresponding to the fourth set of hash values in the second bloom filter of the corresponding sub-region different from the initial value of the bit.

13. A method as claimed in claim 8, 9, 11 or 12, wherein the initial value of the bit is 1.

14. The method of claim 7, wherein the method further comprises: after all the subareas of the memory device execute the query operation, responding to the data to be queried which is not queried yet, and searching the data to be queried in at least one query stopping subarea in a traversal mode.

15. The method of claim 7, wherein the at least one sub-region comprises a first sub-region and a second sub-region, the second sub-region storing no data, the method further comprising:

in response to the first bloom filter of the first sub-region indicating that deleted data exists in the first sub-region, moving data in the first sub-region other than the deleted data to the second sub-region; and the number of the first and second groups,

performing an erase operation on the first sub-region.