WO2019148497A1 - 一种数据查询方法及装置 - Google Patents

一种数据查询方法及装置 Download PDF

Info

Publication number
WO2019148497A1
WO2019148497A1 PCT/CN2018/075300 CN2018075300W WO2019148497A1 WO 2019148497 A1 WO2019148497 A1 WO 2019148497A1 CN 2018075300 W CN2018075300 W CN 2018075300W WO 2019148497 A1 WO2019148497 A1 WO 2019148497A1
Authority
WO
WIPO (PCT)
Prior art keywords
file name
file
current
accessed
directory
Prior art date
Application number
PCT/CN2018/075300
Other languages
English (en)
French (fr)
Inventor
高翔
杜维
陈俊彦
汪宁
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US16/967,660 priority Critical patent/US11507533B2/en
Priority to CN201880036991.5A priority patent/CN110709824B/zh
Priority to PCT/CN2018/075300 priority patent/WO2019148497A1/zh
Priority to EP18903433.3A priority patent/EP3736705B1/en
Publication of WO2019148497A1 publication Critical patent/WO2019148497A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1727Details of free space management performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0605Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to a data query method and apparatus.
  • File systems usually organize the topological relationships of files in a tree structure.
  • files only refer to directory files and ordinary files.
  • leaf nodes in a tree structure represent ordinary files
  • nodes other than leaf nodes represent directory files.
  • the directory file includes a plurality of directory entries, each directory entry includes a file name, a file type, and an index (inode) number, and the computer can obtain the data of the file according to the index identified by the index number.
  • a computer obtains a file (for example, file A)
  • it needs to query the file name in the directory file to obtain the directory entry corresponding to the file name of the file A, and obtain the data of the file A according to the obtained directory entry.
  • the structure of directories in some file systems is a hash including multi-level hash tables ( Hash) tree, each level of the hash table includes multiple hash values, the file name corresponding to each hash value and the index number of the file; other file systems (such as the New technology file system (New technology file system, The structure of the directory in the Ntfs) and B-tree file system (Btrfs) is n (n ⁇ 1) order B+ tree.
  • F2FS Flash Friendly File System
  • EXT4 Fourth Extended File System
  • Hash multi-level hash tables
  • Hash multi-level hash tables
  • each level of the hash table includes multiple hash values, the file name corresponding to each hash value and the index number of the file
  • other file systems such as the New technology file system (New technology file system,
  • New technology file system The structure of the directory in the Ntfs) and B-tree file system (Btrfs) is n (n ⁇ 1) order B+ tree.
  • the computer needs to query the hash table step by step when acquiring the data of the file to be accessed.
  • Each hash table in the query first traverses the hash value and then matches the file name.
  • the level of the hash table is high, it is inefficient to query the file name to be accessed in the hash table; in addition, the hash The tree includes a large number of hashes, resulting in a low effective utilization of the storage space, and in the case of a high level of the hash table, there may be a case where the high-level hash table is not full, further reducing the storage space.
  • Effective utilization if the structure of the directory in the file system is n-order B+ tree, when the computer obtains the data of the file to be accessed, the computer sequentially queries from the smallest keyword, or starts a random query from the root node, and the query is to be accessed.
  • the file name is less efficient; in addition, the keywords in the leaf nodes of the n-th order B+ tree appear in the intermediate nodes, which reduces the effective utilization of the storage space.
  • the efficiency of the computer querying the file name to be accessed is low, and the effective utilization rate of the storage space of the computer is low.
  • the embodiment of the present invention provides a data query method and device, which can solve the problem that the efficiency of the computer querying the file name to be accessed is low, and the effective utilization rate of the storage space of the computer is low.
  • a data query method is provided, the data query method being applied to a read-only file system comprising n (n ⁇ 1) directory blocks, each directory block comprising a directory entry area and a file name area.
  • the data query method is: the data query device determines a target directory block from the n directory blocks, the directory entry area of the target directory block includes m directory entries, and the file name region of the target directory block includes m file names.
  • the m directory entries correspond to the m file names one by one, and the m directory entries and the m file names are arranged in the order of the preset rules.
  • the file name to be accessed is located in the file name range, and the file name range is determined by the target directory block.
  • step A1 is:
  • the binary search algorithm and the target directory block determine the current first set and the current second set.
  • the current first set includes consecutive x file names in the m file names
  • the current second set includes x (m ⁇ x ⁇ 1) a file name, a first file name, and a second file name, the first file name being a file name adjacent to the first file name in the x file names and adjacent to the first file name in the x file names, two
  • the file name is a file name that is arranged after the last file name in the x file names and is adjacent to the last file name of the x file names
  • step B1 is: determining the file name to be accessed and the current second set a first common prefix between the file names
  • step C1 is: comparing the file name to be accessed and the third file name character by character from the first character after the first common prefix, the third file name is the first in the current first set A file name of a preset location; after the step C1 is performed, if the file name to be accessed is the same as the third file name, the data querying device acquires data of the file to be accessed according
  • the data querying device in the present application determines a first common prefix between the file name to be accessed and the file name in the current second set. Since the current second set covers the first set, all files in the current first set are The first common prefix exists between the name and the file name to be accessed, so that the data query device directly compares the file name to be accessed and the third file name character by character from the first character after the first common prefix, thereby effectively improving The rate at which the file to be accessed is queried.
  • the directory structure in the embodiment of the present application only stores directory entries and file names, and does not store other information related to file names or directory entries, thereby effectively improving the storage space. Utilization rate.
  • the data querying device re-determines according to the binary search algorithm, the current first set, and the current second set. a set and a second set, and according to the re-determined first set and the re-determined second set, performing the above step B1 and the above step C1 until the data of the file to be accessed is obtained or the target directory block is determined not to include the file to be accessed name.
  • the data query device if the file name to be accessed is different from the third file name, the data query device further narrows the query range (ie, re-determines the first set), and re-determines the first common prefix.
  • the preset rule is a lexicographic order, and the foregoing “if the file name to be accessed is different from the third file name, according to the binary search algorithm, the current The method of re-determining the first set and the second set is: if the feature value of the file name to be accessed is smaller than the feature value of the third file name, determining that the re-determined first set includes the current first All file names in the collection that are located before the third file name, and the second set that is re-determined includes the first file name, all file names in the current first set that are located before the third file name, and the third file name; or, if The feature value of the access file name is greater than the feature value of the third file name, and the determined first set includes all file names in the current first set that are located after the third file name, and the re-determined second set includes the third file name. All file names in the first set that are located after the third file name and the second file name.
  • the method of “determining a first common prefix between a file name to be accessed and a file name in a current second set” is: determining a file name to be accessed. a first prefix shared between the first file name and the first file name; a second prefix shared between the file name to be accessed and the second file name; determining the first one of the first prefix and the second prefix as the first common prefix .
  • the file name range formed by all the file names in the current second set is larger than the file name range formed by all the file names in the current first set, because all file names in each directory block follow the preset rule.
  • the data querying device can determine the first prefix shared between the file name to be accessed and the first file name, and the second prefix shared between the file name to be accessed and the second file name. The one with the smallest prefix and the second prefix is determined as the first common prefix.
  • the method for determining, by the data query device, the target directory block from the n directory blocks is: the data query device sequentially performs steps A2, B2, and C2; Step A2 is: determining a current third set and a current fourth set according to the binary search algorithm and the n directory blocks; wherein the current third set includes p file names, and the p file names include each of the p directory blocks.
  • the file name of the second preset location in the directory block, and the file names in the current third set are arranged in a preset rule order, p directory blocks are consecutive directory blocks in n directory blocks, and the current fourth set includes p a file name, a fourth file name, and a fifth file name, the fourth file name being a file name adjacent to the first file name in the p file names and adjacent to the first file name in the p file names,
  • the five files are named after the last file name in the p file names and adjacent to the last file name in the p file names, 1 ⁇ p ⁇ n;
  • step B2 is: determining the file name to be accessed With the current fourth set of files a second common prefix between the names;
  • step C2 is: comparing the file name to be accessed and the sixth file name character by character from the first character after the second common prefix; wherein the sixth file is named in the current third set
  • the file name of the three preset positions thus, after performing step C2, if the file name
  • the method of determining the target directory block by the data query device is similar to the method for the data query device to query the file name to be accessed.
  • the preset rule is a lexicographic order
  • the file name of the second preset location is the first file name of the directory block.
  • the data query device determines the directory block to which the file name in the current third set belongs as the target directory block; or, if the feature value of the file name to be accessed is smaller than the feature value of the file name in the current third set, Then, the data querying device determines the directory block located before the file name in the current third set and belonging to the file name adjacent to the file name in the current third set as the target directory block.
  • the foregoing “redetermining the third according to the current third set, the current fourth set, and the binary search algorithm is: if the feature value of the file name to be accessed is smaller than the feature value of the sixth file name, determining that the re-determined third set includes all file names in the current third set that are located before the sixth file name
  • the fourth set that is re-determined includes a fourth file name, all file names in the current third set that are located before the sixth file name, and a sixth file name; or, if the feature value of the file name to be accessed is greater than the sixth file name
  • determining, by the feature value, the re-determined third set includes all file names in the current third set that are located after the sixth file name, and the re-determined fourth set includes the sixth file name, and the current third set is located after the sixth file name All file names and fifth file names.
  • the preset rule is a lexicographic order
  • the file name of the second preset location is the first file name of the directory block.
  • the sixth file name is the first file name in the current third set
  • the feature value of the file name to be accessed is smaller than the feature value of the sixth file name, it will be located before the sixth file name and with the sixth file.
  • the directory block to which the adjacent file name belongs is determined as the target directory block; if the sixth file name is the last file name in the current third set, and the feature value of the file name to be accessed is greater than the feature value of the sixth file name, then The directory block to which the sixth file name belongs is determined as the target directory block.
  • the method of “determining a second common prefix between a file name to be accessed and a file name in the current fourth set” is: determining a file name to be accessed. a third prefix shared with the fourth file name; determining a fourth prefix shared between the file name to be accessed and the fifth file name; determining the smallest one of the third prefix and the fourth prefix as the second common prefix .
  • the “data query device is from n directory blocks.
  • the method for determining the target directory block is: the data query device sequentially performs step A3 and step B3, wherein step A3 is: determining the current candidate directory block and the current third common prefix; step B3 is: following the current third common prefix Starting from the first character, the file name to be accessed and the i-th file name are compared character by character, and the i-th file is named as the file name of the fourth preset position in the i-th directory block in the n directory blocks, 0 ⁇ i ⁇ n;
  • the data querying device re-determines that the candidate directory block is the directory block to which the i-th file name belongs, and determines the re-determined candidate directory block.
  • the data querying device re-determines the third common prefix, the candidate directory block, and the i-th file name.
  • the candidate directory block is re-determined by: if the feature value of the file name to be accessed is greater than the feature value of the i-th file name, determining the re-determined candidate directory The block is a directory block to which the i-th file name belongs; if the feature value of the file name to be accessed is smaller than the feature value of the i-th file name, it is determined that the re-determined candidate directory block is the current candidate directory block.
  • the preset rule sequence is a lexicographic sequence
  • the file name of the fourth preset location is the first file name in the corresponding directory block
  • the foregoing "If the file name to be accessed is different from the i-th file name, the third common prefix is re-determined": when the feature value of the file name to be accessed is greater than the feature value of the i-th file name, the current first target is The prefix is updated to a prefix common to the file name to be accessed and the i-th file name; the first one of the updated first target prefix and the current second target prefix is determined as the re-determined third common prefix; or When the feature value of the file name to be accessed is smaller than the feature value of the i-th file name, the current second target prefix is updated to a prefix common to the file name to be accessed and the i-th file name; the current first target prefix is The smallest one of the updated second target prefix
  • a read-only file system in a second aspect, includes a directory file, where the directory file is composed of n directory blocks, each directory block includes a directory entry area and a file name area, and the directory entry area includes At least one directory entry, the file name area includes at least one file name.
  • the number of directory entries in the directory block is the same as the number of file names, and all directory entries and all file names in the directory block are arranged in a preset rule order.
  • each directory entry in the at least one directory entry includes an index number, a file type, and a file name corresponding to the directory entry in the directory block to which the directory entry belongs. Offset; the above file name area is adjacent to the directory entry area, and the file name area is located after the directory entry area.
  • a data query device having a read-only file system as described in the second aspect above and any possible implementation thereof.
  • the data querying device includes a processing unit and an acquisition unit.
  • the processing unit is configured to determine a target directory block from the n directory blocks of the read-only file system, where the directory entry area of the target directory block includes m directory entries, and the file name area of the target directory block includes m file names.
  • the m directory entries correspond to the m file names one by one, and the m directory entries and the m file names are arranged in the order of the preset rules.
  • the file name to be accessed is located in the file name range, and the file name range is determined by the target directory block.
  • the processing unit is further configured to perform step A1, step B1, and step C1.
  • Step A1 is: determining a current first set and a current second set according to the binary search algorithm and the target directory block, where the current first set includes m The consecutive x file names in the file name.
  • the current second set includes x file names, a first file name, and a second file name.
  • the first file name is preceded by the first file name arranged in the x file names and is associated with The file name adjacent to the first file name in the x file names
  • the second file name is the file adjacent to the last file name in the x file names and adjacent to the last file name in the x file names.
  • Step B1 is: determining a first common prefix between the file name to be accessed and the file name in the current second set;
  • step C1 is: starting from the first character after the first common prefix The character compares the file name to be accessed with the third file name; wherein the third file name is a file name of the first preset position in the current first set.
  • the obtaining unit is configured to: if the processing unit determines that the file name to be accessed is the same as the third file name, obtain the data of the file to be accessed according to the directory entry corresponding to the third file name.
  • the processing unit is further configured to: if the file name to be accessed is different from the third file name, according to the binary search algorithm, the current first set, and the current second set. Re-determining the first set and the second set, and performing step B1 and step C1 according to the re-determined first set and the re-determined second set, until the obtaining unit acquires data of the file to be accessed or the processing unit determines the target The directory block does not include the file name to be accessed.
  • the preset rule is a lexicographic sequence
  • the processing unit is specifically configured to: if the feature value of the file name to be accessed is smaller than the feature value of the third file name, Determining that the re-determined first set includes all file names in the current first set that are located before the third file name, and the re-determined second set includes the first file name, all files in the current first set that are located before the third file name.
  • the third file name or, if the feature value of the file name to be accessed is greater than the feature value of the third file name, determining that the re-determined first set includes all file names in the current first set that are located after the third file name,
  • the second set that is re-determined includes a third file name, all file names in the current first set that are located after the third file name, and a second file name.
  • the processing unit is specifically configured to: determine a first prefix shared between the file name to be accessed and the first file name; determine a file name to be accessed and the second file name A second prefix shared between the file names; determining the first one of the first prefix and the second prefix as the first common prefix.
  • the processing unit is further configured to perform step A2, step B2, and step C2; wherein step A2 is: determining according to the binary search algorithm and the n directory blocks. a current third set and a current fourth set; wherein the current third set includes p file names, and the p file names include file names of the second preset positions in each of the p directory blocks, and the current third
  • the file names in the collection are arranged in the order of preset rules.
  • the p directory blocks are consecutive directory blocks in n directory blocks
  • the current fourth set includes p file names, fourth file names, and fifth file names, and the fourth file.
  • step B2 is: determining the second common prefix between the file name to be accessed and the file name in the current fourth set ;
  • step C2 is: from the second public before After the first character from the by-character file names to be accessed contrast with the sixth filename; wherein the sixth current file called third set of third preset position of the file name.
  • the processing unit is further configured to: if the file name to be accessed is the same as the sixth file name, determine that the directory block to which the sixth file name belongs is the target directory block.
  • the processing unit is specifically configured to: when 2 ⁇ p ⁇ n, re-determine according to the current third set, the current fourth set, and the binary search algorithm.
  • the third set and the fourth set, and according to the re-determined third set and the re-determined fourth set, perform step B2 and step C2; when p 1, the target directory block is determined according to the file name included in the current third set.
  • the preset rule is a lexicographic order, and for each directory block, the file of the second preset location is named the first file name of the directory block;
  • the processing unit is specifically configured to: if the feature value of the file name to be accessed is greater than the feature value of the file name in the current third set, determine the directory block to which the file name in the current third set belongs to a target directory block; or, if the feature value of the file name to be accessed is smaller than the feature value of the file name in the current third set, it will be located before the file name in the current third set and with the file name in the current third set
  • the directory block to which the adjacent file name belongs is determined as the target directory block.
  • the preset rule is a lexicographic sequence
  • the processing unit is specifically configured to: if the feature value of the file name to be accessed is smaller than the feature value of the sixth file name, Determining that the re-determined third set includes all file names in the current third set that are located before the sixth file name, and the re-determined fourth set includes the fourth file name and all files in the current third set that are located before the sixth file name.
  • the re-determined third set includes all file names in the current third set that are located after the sixth file name,
  • the re-determined fourth set includes a sixth file name, all file names located after the sixth file name in the current third set, and a fifth file name.
  • the preset rule is a lexicographic order
  • the file of the second preset location is named the first file name of the directory block
  • the processing unit is further configured to: if the sixth file name is the first file name in the current third set, and the feature value of the file name to be accessed is smaller than the feature value of the sixth file name, the sixth file name is located before the sixth file name and The directory block to which the file name adjacent to the sixth file name belongs is determined as the target directory block; if the sixth file name is the last file name in the current third set, the feature value of the file name to be accessed is greater than the feature of the sixth file name. The value determines the directory block to which the sixth file name belongs as the target directory block.
  • the processing unit is specifically configured to: determine a third prefix shared between the file name to be accessed and the fourth file name; determine the file name to be accessed and the fifth file A fourth prefix shared between the file names; determining the smallest one of the third prefix and the fourth prefix as the second common prefix.
  • the foregoing n directory blocks are arranged in a preset rule order and are stored in a complete binary tree manner.
  • the processing unit is further configured to perform step A3 and step B3; wherein, step A3 is: determining a current candidate directory block and a current third common prefix; and step B3 is: starting from a first character after the current third common prefix
  • the file name to be accessed and the i-th file name are compared character by character, and the i-th file is named as the file name of the fourth preset position in the i-th directory block in the n directory blocks, 0 ⁇ i ⁇ n.
  • the processing unit is further configured to: if the file name to be accessed is the same as the i-th file name, re-determine the candidate directory block as a directory block to which the i-th file name belongs, and determine the re-determined candidate directory block as the target directory block. .
  • the processing unit is further configured to re-execute step B3 according to the re-determined third common prefix, the re-determined candidate directory block, and the re-determined i-th file name, until the target directory block is determined.
  • the preset rule sequence is a lexicographic sequence
  • the file name of the fourth preset location is a first file name in the corresponding directory block.
  • the processing unit is specifically configured to: if the feature value of the file name to be accessed is greater than the feature value of the i-th file name, determine that the re-determined candidate directory block is the directory block to which the i-th file name belongs; If the feature value of the file name is smaller than the feature value of the i-th file name, it is determined that the re-determined candidate directory block is the current candidate directory block.
  • the preset rule sequence is a lexicographic sequence
  • the file name of the fourth preset location is a first file name in the corresponding directory block.
  • the processing unit is specifically configured to: when the feature value of the file name to be accessed is greater than the feature value of the i-th file name, update the current first target prefix to be shared between the file name to be accessed and the ith file name.
  • the smallest one of the updated first target prefix and the current second target prefix is determined as the re-determined third common prefix; or, when the feature value of the file name to be accessed is smaller than the feature of the i-th file name
  • the current second target prefix is updated to a prefix common to the file name to be accessed and the i-th file name
  • the smallest one of the current first target prefix and the updated second target prefix is determined as the re-determination a third common prefix; wherein the initial value of the length of the first target prefix and the length of the second target prefix are both zero, and the length of the first target prefix and the length of the second target prefix are consistent with the characteristics of the file name to be accessed
  • the relationship between the value and the eigenvalue of the i-th file name changes.
  • a terminal comprising: one or more processors, a memory, and a communication interface.
  • the memory, communication interface is coupled to one or more processors; the memory is for storing computer program code, the computer program code comprising instructions, and when the one or more processors execute the instructions, the terminal performs the first aspect as described above and any one thereof A data query method as described in a possible implementation.
  • a computer readable storage medium stores an instruction, when the instruction is run on the terminal described in the fourth aspect, causing the terminal to perform the foregoing A data query method as described on the one hand and any of its possible implementations.
  • a computer program product comprising instructions for causing the terminal to perform the first aspect as described above and any one of its possible implementations when the computer program product is run on the terminal described in the fourth aspect above The data query method described in the manner.
  • the name of the above data query device does not limit the device or the function module itself. In actual implementation, these devices or function modules may appear under other names. As long as the functions of the respective devices or functional modules are similar to the present application, they are within the scope of the claims and their equivalents.
  • the second aspect to the sixth aspect of the present application may refer to the detailed description in the first aspect and various implementations thereof; and the second aspect to the sixth aspect, and each of
  • FIG. 1 is a schematic diagram of a query flow for finding a binary algorithm
  • FIG. 2 is a schematic diagram of a partition structure in a Linux operating system in the prior art
  • FIG. 3 is a schematic diagram of a directory structure of an F2FS file system
  • FIG. 4 is a schematic structural diagram of hardware of a data query apparatus according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a partition of an external memory 42 to 2 of a data query apparatus according to an embodiment of the present application;
  • FIG. 6 is a schematic structural diagram of a directory block of a data query device according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram 1 of an arrangement structure of directory blocks of a data query device according to an embodiment of the present application.
  • FIG. 8 is a second schematic structural diagram of a directory block of a data query device according to an embodiment of the present application.
  • FIG. 9 is a schematic flowchart diagram of a data query method according to an embodiment of the present application.
  • FIG. 10 is a schematic flowchart 1 of a process for determining a target directory block by a data query apparatus according to an embodiment of the present application
  • FIG. 11 is a second schematic flowchart of determining a target directory block by a data query apparatus according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a second set in the embodiment of the present application.
  • FIG. 13 is a schematic flowchart 1 of a process for querying a file name to be accessed by a data query device according to an embodiment of the present application
  • FIG. 14 is a schematic structural diagram 1 of a data query apparatus according to an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram 2 of a data query apparatus according to an embodiment of the present application.
  • the words “exemplary” or “such as” are used to mean an example, illustration, or illustration. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of the words “exemplary” or “such as” is intended to present the concepts in a particular manner.
  • Binary search Also known as binary search, the basic idea is to query data x in m ascending order elements (such as A1, A2, ..., Am). Compare with x, if the two are equal, the query is terminated; if x is less than Then less than Further queries in all elements until an element equal to x is found; if x is greater than Then greater than Further queries in all elements until an element equal to x is found. among them, Used to indicate rounding down.
  • m ascending order elements such as A1, A2, ..., Am
  • the values of the seven ascending order are ⁇ 1, 4, 7, 8, 10, 15, 20 ⁇ . If the value to be queried is “7”, the query process is:
  • the embodiment of the present application is used to query a file name. Therefore, in the embodiment of the present application, the above m elements are m file names.
  • the size between different file names can be determined by the feature value of the file name (such as the order of the file names after arranging the file names in lexicographic order, etc.).
  • the file system provides a structured data storage and organization form, which uses a tree structure to organize the topological relationships of files, which provides convenience for users to access and query files.
  • the directory file needs to save the names and inode numbers of all subfiles in the directory.
  • the file name is visible to the user, the user manages and accesses the file by using the file name;
  • the inode includes basic information of the file (such as file size, file creation time, file modification time, etc.) and a plurality of data pointing to the file in which the file is stored.
  • the pointer information of the data block the computer can obtain the corresponding inode according to the inode number.
  • each data block storing the file data can be determined, and the data of the file is obtained from the determined data block. Since the file system records the file name and index number of the file, in the application scenario of the file system, the user does not need to care about which data blocks the data of the file is stored in, and only needs to remember the directory and file name to which the file belongs to complete the file. Access to data.
  • the storage space provided by the storage medium includes a plurality of partitions, each of which is mounted on one or more file systems.
  • the data for each file is stored in a partition on the storage medium.
  • Each partition of the storage medium is divided into a plurality of blocks.
  • each block is the same size.
  • the size of each block is 1024 bytes or 4096 bytes.
  • the embodiment of the present application refers to a block in which data of a directory file is stored as a directory block, and a block in which data of other types of files is stored is referred to as a data block.
  • each partition of the storage medium includes a super block, an index area, and a data area.
  • the super area stores information about the file system, such as the type of the file system, the number of blocks, and the size of the block.
  • the index area includes k (k ⁇ 1) inodes, and each inode includes basic information of the file (such as file size, file creation time, file modification time, etc.) and a plurality of data blocks pointing to the data storing the file. Pointer information.
  • the data area includes data of a common file and a directory file, wherein the directory file is composed of n (n ⁇ 1) directory blocks, each directory block includes a plurality of directory entries, and one directory entry corresponds to one file, and each directory entry Both include the index (inode) of a file, the file name of the file, and the file type of the file.
  • the structure of a directory in some file systems is a hash tree including a multi-level hash table, and each level of the hash table includes multiple hash values, and each hash The file name corresponding to the hash value and the index number of the file.
  • the structure of the directory in the F2FS file system is a hash tree including a multi-level hash table, and each level has a hash table of a hash bucket using a dedicated number.
  • Each hash bucket is an array of directory entries, and each directory entry in the hash bucket includes a hash value, a file name corresponding to the hash value, and an index number of the file.
  • Table 1 shows the structure of a directory in an F2FS file system, which is a hash tree including an N-level hash table.
  • A denotes a bucket
  • B denotes a directory block
  • N denotes a hash maximum level
  • A(2B) denotes that one hash bucket includes two directory blocks
  • A(4B) denotes that one hash bucket includes four directory blocks.
  • each hash bucket in each level hash table includes two Directory blocks; starting from the N/2+1-level hash table to the N-th level hash table, each hash bucket in each level of the hash table includes four directory blocks.
  • the i-th (i ⁇ N/2)-level hash table includes 2 ⁇ i hash buckets
  • the j-th (j ⁇ N/2)-level hash table includes 2 ⁇ (N/2-1) hash buckets.
  • Fig. 3 shows the hash tree shown in Table 1 above.
  • the level 0 hash table includes a hash bucket: Bucket 0, and bucket 0 includes two directory blocks, directory block 0 and directory block 1.
  • the contents of other levels of hash tables are similar to the level 0 hash table and the level 1 hash table, and will not be described in detail here.
  • the computer queries the file name to be accessed in the directory of the F2FS file system, first calculates a hash value of the file name to be accessed, and then scans the hash value in the level 0 hash table to include the The directory entry of the file name to be accessed and the index number of the file to be accessed. If not found, the computer scans in the level 1 hash table. That is, if the computer does not find a directory entry including the file name to be accessed in the upper-level hash table, the computer scans the next-level hash table in an incremental manner. In each level of the hash table, the computer only needs to scan a hash bucket whose number is obtained by dividing the hash value of the file name to be accessed by the number of hash buckets in the level. .
  • the above hash tree includes a large number of hash values, resulting in a low effective utilization of the storage space.
  • the number of hash buckets in the high-level hash table is also increasing adaptively. In this way, it is highly probable that the high-level hash table is not full, further reducing the effective utilization of the storage space.
  • the n-th order B+ tree is an n-fork sort tree.
  • a B+ tree includes a root node, an internal node, and a leaf node.
  • the root node may be a leaf node or a node including at least two child nodes.
  • All nodes of the n-th order B+ tree include n keywords, each keyword does not save data, only for indexing, and all data is stored in leaf nodes.
  • All leaf nodes of the n-th order B+ tree include information of all keywords, and pointer information to these keywords, and the leaf nodes themselves are linked in small order according to the size of the keywords.
  • n-th order B+ tree when the computer queries the file name to be accessed in the directory, the computer sequentially queries from the smallest keyword, or starts a random query from the root node, and the query efficiency is relatively low.
  • the method of the present application provides a data query method, which is applied to a read-only file system including n (n ⁇ 1) directory blocks, each of which is low in efficiency and low in the effective utilization of the computer storage space.
  • the directory block includes a directory entry area and a file name area, the directory entry area includes at least one directory entry, the file name area includes at least one file name, and the number of directory entries in the same directory block is the same as the number of file names, and is in the same directory block. All directory entries and all file names are arranged in the order of preset rules. The number of directory entries in different directory blocks may be the same or different.
  • the file name area of the target directory block in the n directory blocks includes m (m ⁇ 1) file names
  • the directory entry area of the target directory block includes m directory entries, m file names and m directory entries.
  • the m directory entries and the m file names are arranged in a preset rule order.
  • the data querying apparatus in the embodiment of the present application determines the file name range formed by the file name to be accessed in the target directory block, that is, after the data query device determines the target directory block, according to the binary search algorithm and Determining, by the target directory block, a current first set including consecutive x file names in the m file names, and a current second set including the x file names, first file names, and second file names, and determining a first common prefix between the file name to be accessed and the file name in the current second set, such that the data query device compares the file name to be accessed and the current first set character by character from the first character after the first common prefix
  • the file name of the first preset location ie, the third file name
  • the data querying device acquires the data of the file to be accessed according to the directory entry corresponding to the third file name.
  • the data querying device determines a first common prefix between the file name to be accessed and the file name in the current second set. Since the current second set covers the first set, all file names in the current first set are to be accessed. The first common prefix exists between the file names, so that the data query device directly compares the file name to be accessed and the third file name character by character from the first character after the first common prefix, thereby effectively improving the query to be accessed. The rate of the file. In addition, compared with the existing directory tree structure, the directory structure in the embodiment of the present application only stores directory entries and file names, and does not store other information related to file names or directory entries, thereby effectively improving the storage space. Utilization rate.
  • the data query device in the embodiment of the present application may be a terminal such as a computer, a mobile phone, or a tablet computer.
  • FIG. 4 is a schematic structural diagram of a data query apparatus according to an embodiment of the present application.
  • the data query device includes a communication interface 40, a processor 41, and a storage medium 42.
  • the communication interface 40, the processor 41 and the storage medium 42 are connected by the system bus 44, and communication with each other is completed.
  • the communication interface 40 is used to communicate with other devices, such as sharing data of a certain file with other devices.
  • the storage medium 42 can be used to store data of a directory file, can also be used to store data of a common file, can also be used to store software programs and application modules, and the processor 41 executes a software program stored in the storage medium 42 and an application module, thereby Perform various functional applications of the data query device.
  • the storage medium 42 includes memories 42 to 1 and external memories 42 to 2.
  • the memories 42 to 1 are used to temporarily store the arithmetic data of the processor 41, the data exchanged with the external memories 42 to 2, and the like.
  • the external memories 42 to 2 are used to store data of application programs, directory files, and normal files.
  • the directory file is composed of n (n ⁇ 1) directory blocks, each directory block includes a directory entry area and a file name area, the directory entry area includes at least one directory entry, and the file name area includes at least one file name.
  • the number of directory entries in the same directory block is the same as the number of file names, and all directory entries and all file names in the same directory block are arranged in the order of preset rules.
  • the number of directory entries in different directory blocks may be the same. May be different.
  • a detailed explanation of the directory block refers to the subsequent description, which will not be described in detail herein.
  • the operating system may be a Windows operating system or a Linux operating system.
  • the external memories 42 to 2 are non-volatile memories, such as at least one disk storage device, an electrically erasable programmable read-only memory (EEPROM), and a flash memory device.
  • EEPROM electrically erasable programmable read-only memory
  • flash memory device For example, NOR flash memory or NAND flash memory.
  • the nonvolatile memory stores an operating system and an application executed by the processor 41.
  • the processor 41 can load the running program and data from the nonvolatile memory to the memory 42 to 1 and store the data content in a storage device dedicated to storage.
  • the storage medium 42 may exist independently and be coupled to the processor 41 via a system bus 44.
  • the storage medium 42 can also be integrated with the processor 41.
  • the processor 41 is the control center of the data query device.
  • the processor 41 connects various portions of the entire data query device using various interfaces and lines, performs storage by running or executing software programs and/or application modules stored in the storage medium 42, and invoking data stored in the storage medium 42.
  • the various functions of the device and the processing of the data thereby overall monitoring of the data query device.
  • the processor 41 may include only a central processing unit (CPU), or may be a combination of a CPU, a digital signal processor (DSP), and a control chip in the communication unit.
  • the CPU may be a single operation core, and may also include a multi-operation core.
  • the processor 41 may include one or more CPUs, for example, the processor 41 in FIG. 4 includes a CPU 0 and a CPU 1.
  • the system bus 44 may be a circuit that interconnects the above components and communicates between the components.
  • the system bus 44 is an Industry Standard Architecture (ISA) bus and a Peripheral Component Interconnect (Peripheral Component Interconnect, PCI) bus, Extended Industry Standard Architecture (EISA) bus or Advanced Microcontroller Bus Architecture (AMBA).
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • AMBA Advanced Microcontroller Bus Architecture
  • the system bus 44 can be divided into an address bus, a data bus, a control bus, and the like.
  • various buses are illustrated as system bus 44 in FIG.
  • the data query method provided by the embodiment of the present application is applicable to an application scenario in which the file system of the data query device is a read-only file system.
  • the directory structure stored in the external memory 42 to 2 of the data query device in the embodiment of the present application is first introduced.
  • each of the partitions includes a super zone, an index zone, and a data zone.
  • the super zone, the index zone, and the data zone reference may be made to the description of the structure shown in FIG. 2 above, and details are not described herein again.
  • the directory file in the data area of the embodiment of the present application also includes n directory blocks.
  • each directory block in the n directory blocks includes a directory entry area and a file name area
  • the directory entry area includes multiple directory entries (using m directory entries as an example for description, m ⁇ 1)
  • the file name area includes multiple file names (indicated by m file names)
  • m directory entries correspond to m file names
  • m directory entries and m file names are in accordance with Set the rules in order. Since the directory entry has a one-to-one correspondence with the file name, the directory entry has a storage address of the file name corresponding thereto.
  • the number of the directory entries included in the different directory blocks may be the same or different, which is not specifically limited in this embodiment of the present application.
  • the directory entries in the embodiment of the present application are slightly different from the existing directory entries.
  • Existing directory entries include information such as file name, index number, and file type.
  • the directory entry in the embodiment of the present application includes information such as an index number, a file type, and an offset of the file name in the directory block, and does not include the file name.
  • the existing directory entry corresponding to the file name "ABC” includes information such as S1, ABC, common file type, and the like; in this embodiment, the file name of the file is stored.
  • the directory entry corresponding to the file name "ABC" in the embodiment of the present application includes information such as S1, 32, and common file type.
  • the directory entries in the subsequent content represent directory entries that do not include the file name.
  • the structure of the directory item and the file name stored in the embodiment of the present application enables the storage space of the data query device to be effectively utilized, thereby effectively improving the utilization of the storage space of the data query device. .
  • the file name area in the embodiment of the present application is located after the directory entry area, and the two are adjacent, such that the offset of the file name included in the first directory entry in the directory entry area is not only used to indicate the first file.
  • the storage address of the name is also used to indicate the end of the directory entry area. Since the size of the directory entries in the same directory block is the same, the data obtaining device can calculate the number of directory entries in the case where the first directory entry is used to indicate the end of the directory entry region.
  • FIG. 5 shows a partition structure of the storage medium 42 of the data query device in the embodiment of the present application.
  • the data area includes n directory blocks (directory block 1, directory block 2, ..., directory block n), each directory block includes a directory entry area and a file name area, and the directory entry area of the directory block 1 includes m directory entries.
  • directory item 1, directory entry 2, ..., directory entry m the file name area of the directory block 1 includes m file names (file name 1, file name 2, ..., file name m)
  • directory entry 1 includes The storage location of the file name 1, the directory entry 2 includes the storage location of the file name 2, and so on, and the directory entry m includes the storage location of the file name m.
  • the data acquisition means can calculate that the number of directory entries in the directory block 1 is m based on the storage location of the file name 1 and the start position of the directory block.
  • the directory block may include only one directory entry and the file name of file A.
  • Figure 6 shows a directory block that includes only one directory entry and one file name.
  • the file name may be stored in the first directory entry of the next directory block of the directory block, or
  • the file name is stored in a data block, and offset information for pointing to the certain data block is stored in a file name area of the directory block.
  • n directory blocks in the embodiment of the present application and all file names in each directory block are arranged in a preset rule order.
  • the preset rule may be a lexicographical order, or may be a lexicographical reverse order, and may be in other orderly manners, which is not specifically limited in the embodiment of the present application.
  • each directory block there is a one-to-one correspondence between directory entries and file names. Therefore, all directory entries in the directory block are arranged in the same order as all file names in the directory block.
  • the storage manner of the n directory blocks may be sequential storage between blocks or stored in a complete binary tree manner between blocks.
  • the sequential storage between blocks means that n directory blocks are stored according to n file names in a preset rule order.
  • each file name in the n file names refers to the second in the directory block corresponding to the file name.
  • the file name of the preset location is a file name of the preset location.
  • the file name of the second preset location in the directory block may be the first file name of the directory block, or the last file name of the directory block, and may be the first file name and the last one of the directory block. Any other file name other than the file name is not specifically limited in the embodiment of the present application.
  • the file name of the second preset location in the directory block is named as the first file name in the directory block or the last file name in the directory block.
  • the first file name in the directory block included in the data query device has ANDY, BABY, CAFE, DASH, and EMMA, which are arranged in lexicographic order, and the order of the five file names is in order.
  • ANDY, BABY, CAFE, DASH, EMMA therefore, the directory block 1 in the data query device is the first file block named ANDY, and the directory block 2 is the first file block named BABY, the directory block. 3 is the first file block named CAFE, the directory block 4 is the first file block named DASH, and the directory block 5 is the first file block named EMMA.
  • Storing blocks in a completely binary tree manner means that n directory blocks are stored in a completely binary tree manner.
  • the n directory blocks in the embodiment of the present application are arranged according to the preset rule according to the file name of the fourth preset location in each directory block.
  • the file name of the fourth preset location in the directory block may be the first file name of the directory block, or the last file name of the directory block, and may be the first file name and the last one of the directory block. Any other file name other than the file name is not specifically limited in the embodiment of the present application.
  • the file name of the fourth preset location in the directory block is the first file name in the directory block
  • the first file name in the directory block included in the data query device has ANDY, BABY. , CAFE, DASH, and EMMA are arranged in lexicographic order.
  • the order of the five file names is: ANDY, BABY, CAFE, DASH, EMMA.
  • the order of the files is DASH, BABY, EMMA, ANDY, and CAFE.
  • the directory block 1 in the data query device is the first file block named DASH
  • the directory block 2 is the first file block named BABY
  • the directory block 3 is the first file named EMMA.
  • the directory block, the directory block 4 is the first file block named ANDY
  • the directory block 5 is the first file block named CAFE.
  • the data query method provided by the present application will be described by taking the storage mode of n directory blocks as the sequential storage between blocks as an example. Specifically, the structure of the data query device shown in FIG. 4, the partition structure of the storage medium 42 of the data query device shown in FIG. 5, and the structure of the directory block in the data query device shown in FIG. The schematic is described in detail.
  • FIG. 9 is a schematic flowchart diagram of a data query method according to an embodiment of the present application.
  • the data query method provided by the embodiment of the present application specifically includes the following steps.
  • the data query device obtains a file access request, where the file access request includes a file name to be accessed.
  • an application in the data query device triggers a command to obtain a file to be accessed during the running process.
  • the name of the file to be accessed is collectively referred to as a file name to be accessed.
  • the command to acquire a file to be accessed is also referred to as a file access request, and the file access request includes a file name to be accessed.
  • the data query device acquires the request to access the file a.
  • the data querying device determines the target directory block from the n directory blocks.
  • the data query device in the embodiment of the present application includes n directory blocks, each directory block includes a directory entry area and a file name area, the directory entry area includes at least one directory entry, and the file name area includes at least one file. name.
  • the file names in each directory block in the embodiment of the present application are sorted according to a preset rule order. Therefore, the file names of each directory block can form a file name range.
  • the directory block 1 includes three file names: ANDY, APPLE, and ATTENT
  • the file name range of the file name in the directory block 1 is: [ANDY, ATTENT].
  • the data query device needs to first determine the file name range in which the file name of the file name to be accessed is located, and then search for the file name to be accessed from the determined directory block. .
  • the data querying apparatus in the embodiment of the present application determines the target directory block from the n directory blocks, that is, the file name to be accessed is located in the file name range formed by the file name of the target directory block.
  • the file name to be accessed is within the file name range composed of the file name of the target directory block, but the file name to be accessed may not belong to the file name included in the target directory block. That is, the target directory block may include the file name to be accessed.
  • the directory block 1 includes three file names: ANDY, APPLE, and ATTENT.
  • the file name range of the file name in the directory block 1 is: [ANDY, ATTENT], and the file name ANGEL to be accessed is located in the file name range. Within [ANDY, ATTENT], but the file name ANGEL to be accessed is not the file name included in directory block 1.
  • the directory entry area of the target directory block in the embodiment of the present application includes m directory entries, the file name area of the target directory block includes m file names, and the m directory entries have a one-to-one correspondence with the m file names.
  • the directory entry includes information such as the index number, the file type, and the offset of the file name in the directory block.
  • the method for the data query device to determine the target directory block is:
  • Step A2 The data query device determines the current third set and the current fourth set according to the binary search algorithm and the n directory blocks, where the current third set includes p file names, and the p file names include each of the p directory blocks. a file name of the second preset location in the directory block, and the p file names are arranged in a preset rule order.
  • the p directory blocks are consecutive directory blocks in the n directory blocks, and the current fourth set includes the p blocks.
  • the fourth file name being a file name adjacent to the first file name in the p file names and adjacent to the first file name in the p file names
  • the five file names are the file names that are arranged after the last file name in the p file names and are adjacent to the last file name in the p file names, 1 ⁇ p ⁇ n;
  • Step B2 determining a second common prefix between the file name to be accessed and the file name in the current fourth set
  • Step C2 The file name to be accessed and the sixth file name are compared character by character from the first character after the second common prefix; wherein the sixth file is named the file name of the third preset position in the current third set;
  • the data querying device determines that the directory block to which the sixth file name belongs is the target directory block.
  • the file name range of the file name of the fourth set is larger than the file name range of the file name of the current third set, and the current fourth set includes the fourth file name and the fifth file name.
  • the fourth file name and the fifth file name are empty.
  • the file name in the current third set also has the file name to be accessed and the current fourth set.
  • a second common prefix between the file names In this case, after determining the second common prefix, the data query device can compare the file name to be accessed and the sixth file name character by character from the first character after the second common prefix. Improve the efficiency of the query.
  • the second common prefix is initialized to be empty.
  • the method for determining, by the data querying device, the second common prefix between the file name to be accessed and the file name in the current fourth set is: the data querying device determines the third shared between the file name to be accessed and the fourth file name. Prefixing, and determining a fourth prefix shared between the file name to be accessed and the fifth file name; then, the data querying device determines the smallest one of the third prefix and the fourth prefix as the second common prefix.
  • the current fourth set is ⁇ ANDY, BABY, CAFE, DASH, EMMA ⁇
  • the current third set is ⁇ BABY, CAFE, DASH ⁇
  • the fourth file name is ANDY, fifth.
  • the file name is EMMA
  • the file to be accessed is named CORE.
  • the data query device determines that the third prefix is empty, and the fourth prefix is empty, and the data querying device determines that the second common prefix is empty.
  • the current fourth set is ⁇ A, AC, ACB, ACD, AD, B, C ⁇
  • the current third set is ⁇ AC, ACB, ACD, AD, B ⁇
  • the sixth file is named the file name of the third preset position in the current third set, and the file name of the third preset position may be the current third set.
  • the file name may also be the file name of the other location in the current third set, which is not specifically limited in this embodiment of the present application.
  • the data query device After the data query device compares the file name to be accessed and the sixth file name character by character, it can be determined whether the file name to be accessed is the same as the sixth file name.
  • the data querying device determines that the directory block to which the sixth file name belongs is the target directory block.
  • the data querying device may not need to determine the target directory block, and the data querying device may directly determine the directory entry corresponding to the file name to be accessed according to the sixth file name. Then, according to the directory entry corresponding to the file name to be accessed, the data of the file to be accessed is obtained.
  • the data querying device re-determines the third set and the fourth set according to the current third set, the current fourth set, and the binary search algorithm, and according to The re-determined third set and the re-determined fourth set perform step B2 above and step C2 above.
  • the re-determined third set includes all file names in the current third set that are arranged before or after the sixth file name.
  • the re-determined third set includes all file names that are arranged after the sixth file name in the current third set. If the feature value of the file name to be accessed is smaller than the feature value of the sixth file name, the re-determined third set includes all file names in the current third set that are arranged before the sixth file name.
  • the file included in the current third set is named ANDY, BABY, CAFE, DASH, and EMMA in FIG. 7, and the sixth file is named CAFE. If the file to be accessed is named ANDY, the third set that is re-determined includes the file names ANDY and BABY. If the file to be accessed is named DASH, the third set that is re-determined includes the file names DASH and EMMA.
  • the data querying means performs the above step B2 according to the re-determined fourth set to re-determine the second common prefix.
  • the current fourth set is ⁇ A, AC, ACB, ACD, AD, B, C ⁇
  • the current third set is ⁇ AC, ACB, ACD, AD, B ⁇
  • the current second common prefix is empty (the current third prefix is A, and the current fourth prefix is Empty, therefore, the current second common prefix is empty, and the file name ACC and the file name ACD to be accessed are compared character by character from the first character according to the current second common prefix.
  • the data query device re-determines the third set and the fourth set, and the re-determined third set is ⁇ AC, ACB ⁇ , and the determined number is The four sets are ⁇ A, AC, ACB, ACD ⁇ .
  • the data querying device re-determines the second common prefix based on the re-determined fourth set.
  • the fourth file in the re-determined fourth set is named A
  • the fifth file in the re-determined fourth set is named ACD.
  • the third prefix is A
  • the fourth prefix is AC
  • the data querying device determines the smallest one of A and AC as the second common prefix, that is, the second common prefix re-determined by the data querying device is A.
  • the data querying device determines the target directory block according to the file name included in the current third set. Specifically, in the case that the file name of the second preset location is the first file name of the directory block, if the feature value of the file name to be accessed is greater than the feature value of the file name in the current third set, the data query device will The directory block to which the file name in the third set belongs is determined as the target directory block; if the feature value of the file name to be accessed is smaller than the feature value of the file name in the current third set, the data query device will be located in the current third set. The directory block to which the file name adjacent to the file name in the current third set belongs before is determined as the target directory block.
  • each file name is the first file name in the corresponding directory block, and if the current third set includes the nine file names A, B, C, D, E, F, G, H, and I,
  • the file name is E
  • the file to be accessed is named EA.
  • the process of determining the target directory block by the data query device is: 1.
  • the second common prefix is empty, and the data query device compares characters from the first character according to the second common prefix.
  • the data querying device re-determines the third set as ⁇ F, G, H, I ⁇ ; In step 1, the third set re-determined by the data query device is ⁇ F, G, H, I ⁇ .
  • the current third set is ⁇ F, G, H, I ⁇ , if the sixth file Named in the current third collection File name, the sixth file name is G, the second common prefix is empty in this step, and the data query device compares the file name EA and the file name G to be accessed character by character from the first character according to the second common prefix;
  • the feature value of the access file name EA is smaller than the feature value of the file name G
  • the data query device re-determines that the third set is ⁇ F ⁇
  • the re-determined fourth set is ⁇ E, F, G ⁇ ; 3.
  • the third set re-determined by the data query device is ⁇ F ⁇
  • the re-determined fourth set is ⁇ E, F, G ⁇ .
  • the current third set is ⁇ F ⁇
  • the current fourth set is ⁇ E, F, G ⁇
  • the second common prefix determined by the data query device according to ⁇ E, F, G ⁇ is empty; the data query device compares the file name EA to be accessed character by character from the first character according to the current second common prefix.
  • the file name F since the current third set includes only one file name F, and the feature value of the file name EA to be accessed is smaller than the feature value of F, the target directory block is a file name E located before F and adjacent to F The belonging directory block.
  • the file to be accessed is named CORE, and the data query device determines the target directory block.
  • the process is as follows: 1. At this time, the fourth file name and the fifth file name are empty, and the second common prefix is empty. If the sixth file name is CAFE, the data query device compares characters from the first character according to the third common prefix.
  • the data query device redetermines the third set. ⁇ DASH ⁇ , and re-determine that the second common prefix is empty; at this time, the current third set is ⁇ DASH ⁇ , the current second common prefix is empty, and the data query device starts from the first character according to the current second common prefix.
  • the file name CORE and the file name DASH are compared character by character; since the current third set includes only one file name DASH, and the feature value of the file name CORE to be accessed is smaller than the feature value of the file name DASH, the data query device determines the target directory.
  • the block is a directory block that is located before the DASH and belongs to the file name CAFE adjacent to the DASH; 4.
  • the data query device determines that the target directory block is the directory block 3.
  • each file name is the first file name in the corresponding directory block, and if the current third set is ⁇ A, AC, ACB, ACD, AD, B, C ⁇ , the fourth file name and the fifth file
  • the name is empty, the sixth file is named ACD, and the file to be accessed is named ACC.
  • the process of determining the target directory block by the data query device is: 1.
  • the data querying device compares the file name ACC and the sixth file name ACD from the first character character by character according to the second common prefix; since the feature value of the file name ACC to be accessed is smaller than the feature value of the file name ACD, The data querying device re-determines the third set to be ⁇ A, AC, ACB ⁇ , and re-determines that the fourth set is ⁇ empty, A, AC, ACB, ACD ⁇ ; 2.
  • step 1 the data query device re-determines the first The three sets are ⁇ A, AC, ACB ⁇ , so the current third set in step 2 is ⁇ A, AC, ACB ⁇ , if the sixth file is named the current third set
  • the sixth file name is AC
  • the fourth set is redefined as ⁇ empty, A, AC, ACB, ACD ⁇
  • the current fourth set in step 2 is ⁇ empty, A, AC, ACB, ACD ⁇
  • the fourth file name in the current fourth set is empty
  • the fifth file in the current fourth set is named ACD.
  • the current third prefix is empty, and the current fourth prefix is AC.
  • the second common prefix re-determined by the data querying device is empty; the data querying device compares the file name ACC and the file name AC to be accessed character by character from the first character according to the second common prefix; due to the feature value of the file name ACC to be accessed The data querying device re-determines that the third set is ⁇ ACB ⁇ , and the re-determined fourth set is ⁇ AC, ACB, ACD ⁇ ; 3.
  • step 2 the data query device re-determines The third set is ⁇ ACB ⁇ , and the re-determined fourth set is ⁇ AC, ACB, ACD ⁇ , therefore, in step 3, the current third set is ⁇ ACB ⁇ , and the current fourth set is ⁇ AC, ACB, ACD ⁇ ; the fourth file in the current fourth set is named AC, therefore, the third before It is "AC”; the fifth file in the current fourth set is named ACD, therefore, the fourth prefix is "AC”; the data querying device has the smallest one of the third prefix "AC” and the fourth prefix "AC".
  • the current second common prefix is “AC”
  • the second common prefix of the data query device compares the file name ACC and the file name ACB from the first character after the “AC” character by character
  • the current third set includes only one file name ACB, and the feature value of the file name to be accessed ACC is greater than the feature value of the ACB. Therefore, the target directory block is a directory block to which the ACB belongs.
  • the file in the second preset position is named the first file name in the directory block
  • the file name in the third preset position is the first file name in the current third set, that is, the sixth file name is the current third name.
  • the first file name in the collection if the feature value of the file name to be accessed is smaller than the feature value of the sixth file name, regardless of whether the value of p is greater than 1, the data query device will be located before the sixth file name and The directory block to which the file name adjacent to the six file names belongs is determined as the target directory block.
  • the process of determining the target directory block by the data query device is: 1.
  • the fourth file name and the fifth file name are empty, and the second common prefix is empty.
  • the sixth file name is CAFE, the data query device compares the file name CORE and the file to be accessed character by character from the first character according to the third common prefix.
  • the third prefix shared with the file name CAFE is “C”; the fourth prefix shared between the file name CORE and the fifth file name to be accessed is empty; therefore, the re-determined second common prefix is empty; 2
  • the third set re-determined by the data query device in step 1 is ⁇ DASH, EMMA ⁇ , so in this step, the current third set is ⁇ DASH, EMMA ⁇ , and if the sixth file is named DASH, the data query device is based on Current number
  • the second common prefix compares the file name CORE and the file name DASH from the first character character by character.
  • the data query device determines that the target directory block is located before the DASH and The directory block to which the file name CAFE belongs adjacent to the DASH, that is, the target directory block is determined to be the directory block 3.
  • the file name in the second preset position is the first file name in the directory block
  • the file name in the third preset position is the last file name in the current third set, that is, the sixth file name is in the current third set.
  • the data querying device determines the directory block to which the sixth file name belongs as the target directory block.
  • the file to be accessed is named END, and the sixth file name is the last file name in the current third set.
  • EMMA since the feature value of the file name END to be accessed is larger than the feature value of the file name EMMA, the data query device determines that the directory block to which the file name EMMA belongs is the target directory block.
  • the data query device in the embodiment of the present application determines that the pseudo code of the target directory block may be the following code:
  • the headprefix in the code is the third prefix
  • the endprefix is the fourth prefix
  • the query closed interval is equivalent to the third set.
  • the query closed interval of the data query device is continuously reduced, and in each query closed interval, the data query device starts from the first character after the common prefix, and is compared character by character.
  • the access file name fname and the file name dirent0name[mid] located in the middle of the closed interval of the query effectively increase the rate of querying the target directory block.
  • the data query device After the data query device determines the target directory block in S901, it searches for the same file name as the file name to be accessed from the target directory block.
  • the directory entry corresponding to the file name to be accessed is referred to as a target directory entry.
  • the target directory entry After the data query device finds the same file name as the file name to be accessed, the target directory entry may be determined, and the data of the file to be accessed is obtained according to the target directory entry.
  • the data querying apparatus executes S902 and sequentially performs subsequent steps.
  • the data query device determines the current first set and the current second set according to the binary search algorithm and the target directory block.
  • the current first set includes consecutive x file names among the m file names
  • the current second set includes the x file names, the first file name, and the second file name, and m ⁇ x ⁇ 1.
  • the first file name is a file name that is arranged before the first file name in the x file names and is adjacent to the first file name in the x file names
  • the second file name is the last one of the x file names.
  • the target directory block includes m file names, and the data query device queries the m file names in the target directory block according to the binary search algorithm.
  • the data querying device determines, according to the binary search algorithm and the target directory block, a current first set including x file names and a current second set including the x file names, the first file name, and the second file name.
  • the query interval is continuously narrowed in the query process of the data query device, and the data query device needs to compare the files to be accessed in each query interval.
  • the embodiment of the present application is described by taking the current query interval as the current first set as an example.
  • the first file name and the second file name are empty.
  • the data querying device determines a first common prefix between the file name to be accessed and the file name in the current second set.
  • the file name range formed by all file names in the current second set is larger than the file name range composed of all file names in the current first set, and therefore, the current first set is in the first set.
  • the first common prefix exists for all file names.
  • the first common prefix is initialized to be empty if the first file name and the second file name are empty.
  • the data querying apparatus determines the first common prefix by: the data querying apparatus determines a first prefix shared between the file name to be accessed and the first file name, and determines that the file name to be accessed is shared with the second file name. The second prefix; then, the data querying device determines the one of the first prefix and the second prefix to be the first common prefix.
  • the target directory block includes seven sequentially arranged file names: CAFE, CAGE, CAK, CELL, CORN, DAB, DACE. If the current first set is ⁇ CAFE, CAGE, CAK, CELL, CORN, DAB, DACE ⁇ , the first file name and the second file name are both empty, the third file name is CELL, and the file to be accessed is named CAK, due to the A file name and a second file name are both empty. Therefore, the first prefix and the second prefix are both empty. Correspondingly, the current first common prefix is empty.
  • the current first set is ⁇ CAFE, CAGE, CAK ⁇
  • the first file name is empty
  • the second file name is CELL
  • the first prefix shared between the first file name and the file name to be accessed CAK is empty.
  • the second prefix shared between the access file name CAK and the second file name CELL is "C"
  • the length of the first prefix is smaller than the second prefix. Therefore, the current first common prefix is empty.
  • the data query device compares the file name to be accessed and the third file name character by character from the first character after the first common prefix.
  • the third file is named as the file name of the first preset location in the current first set.
  • the file name of the first preset location may be the first in the current first set.
  • the file name may also be the file name of the other location in the current first set, which is not specifically limited in this embodiment of the present application.
  • the data querying device compares the file name to be accessed character by character from the first character after the first common prefix after determining the first common prefix.
  • the third file name increases the rate at which the data query device finds the same file name as the file name to be accessed.
  • the data query device After the data query device compares the file name to be accessed and the third file name character by character, it can be determined whether the file name to be accessed is the same as the third file name.
  • the file corresponding to the third file name is the file to be accessed.
  • the target directory entry is the directory entry corresponding to the third file name.
  • the data query device can obtain data of the file to be accessed according to the target directory entry. In this case, after the execution of S904, the data query means continues to perform the following step S905.
  • the data query device needs to determine the size relationship between the feature value of the file name to be accessed and the feature value of the third file name, and then re-determine the first set according to the size relationship.
  • the second set and according to the re-determined first set and the re-determined second set, performs the above step B1 and the above step C1 until the data of the file to be accessed is obtained or the target directory block is determined not to include the file name to be accessed.
  • the first set re-determined by the data query device includes all file names in the current first set that are located before the third file name. If the feature value of the file name to be accessed is greater than the feature value of the third file name, the first set re-determined by the data querying device includes all file names in the current first set that are located after the third file name.
  • the first set is ⁇ CAFE, CAGE, CAK, CELL, CORN, DAB, DACE ⁇
  • the third file is named CELL.
  • DACE the feature value of the file name DACE to be accessed is greater than the feature value of the third file name CELL
  • the re-determined first set is ⁇ CORN, DAB, DACE ⁇ .
  • CAGE the feature value of the file name CAGE to be accessed is smaller than the feature value of the third file name CELL
  • the re-determined first set is ⁇ CAFE, CAGE, CAK ⁇ .
  • the re-determined second set re-determined by the data query device includes the first file name and the current first set in front of the third file name. All file names and third file names.
  • the second set re-determined by the data querying device includes a third file name, all file names in the current first set that are located after the third file name, and a second file name.
  • the data query device acquires data of the file to be accessed according to the target directory item.
  • the data query device obtains an index of the file to be accessed from the target directory entry, and obtains an index of the file to be accessed according to the index number of the file to be accessed, and further, the data query device obtains the index according to the file to be accessed.
  • the data of the file to be accessed is a configurable period of time.
  • the current first set is ⁇ CAFE, CAGE, CAK, CELL, CORN, DAB, DACE ⁇
  • the third file name is CELL, the first file name and the second file.
  • the names are all empty, and the file to be accessed is named CAK.
  • the process of querying the file name to be accessed by the data query device is: 1.
  • the first common prefix is initialized to be empty, and the data query device compares the file name CAK and the file name CELL to be accessed character by character according to the first common prefix;
  • the feature value of the file name CAK is smaller than the feature value of the file name CELL, and the data query device re-determines the first set to be ⁇ CAFE, CAGE, CAK ⁇ , and re-determines that the first file name in the second set is empty, and the second file name is CELL. Therefore, the re-determined first common prefix is empty (the second prefix shared between the file name CAK to be accessed and the file name CELL is "C", and the first file name between the file name CAK and the first file name is shared.
  • the prefix is empty, so the re-determined first common prefix is empty; 2.
  • the first set re-determined in step 1 is ⁇ CAFE, CAGE, CAK ⁇ , so the current first set in step 2 is ⁇ CAFE, CAGE, CAK ⁇ , correspondingly, the current first common prefix is empty.
  • the data query device compares the file name CAK and the file name CAGE to be accessed character by character according to the current first common prefix. Since the feature value of the file name CAK to be accessed is greater than the feature value of the file name CAGE, the data query device re-determines that the first set is ⁇ CAK ⁇ , and the second set is determined to be ⁇ CAGE, CAK, CELL ⁇ ; 3.
  • step 2 The re-determined first set is ⁇ CAK ⁇ , and the second set is re-determined as ⁇ CAGE, CAK, CELL ⁇ , so the current first set in step 3 is ⁇ CAK ⁇ , and the current second set is ⁇ CAGE, CAK , CELL ⁇ , the data query device determines that the current first common prefix is "C", and the data query device compares the file name CAK to be accessed and the file name CAK in the current first set character by character from the first character after "C". . Since the file name CAK in the current first set is the same as the file name CAK to be accessed, the data querying device determines that the target directory entry is a directory entry corresponding to the CAK. Further, the data query device acquires data of the file to be accessed according to the target directory entry.
  • the pseudo code of the data query device in the target directory block for querying the file name to be accessed may be the following code:
  • the headprefix in the code is the first prefix
  • the endprefix is the second prefix
  • the fname is the file name to be accessed.
  • the query closed interval is equivalent to the first set.
  • the query interval of the data query device is continuously reduced, and in each query interval, the data query device starts from the first character after the common prefix. Comparing the file name fname to be accessed character by character with the file name didentname[mid] located in the middle of the query interval effectively increases the query rate.
  • the flow of the data query method provided by the embodiment of the present application is still the flow shown in FIG. 9, and the data query device can also execute the S900. ⁇ S905.
  • the data query device determines the target directory block from the n directory blocks. The method is different.
  • the data query device interprets the method of determining the target directory block from n directory blocks.
  • the method for the data query device to determine the target directory block is:
  • Step A3 The data querying apparatus determines the current candidate directory block and the current third common prefix
  • Step B3 The data query device compares the file name to be accessed and the i-th file name character by character from the first character after the current third common prefix, and the i-th file is named in the i-th directory block of the n directory blocks.
  • the data querying device re-determines that the candidate directory block is the directory block to which the i-th file name belongs, and determines the re-determined candidate directory block as the target directory block.
  • the i-th file is named as the file name of the fourth preset position in the i-th directory block of the n directory blocks.
  • the file name of the fourth preset location may be the first file name in the corresponding directory block, or may be the last file name in the corresponding directory block, and may also be another file name in the corresponding directory block, which is implemented in the present application. This example does not specifically limit this.
  • the data query device After the data query device compares the file name to be accessed and the i-th file name character by character from the first character after the current third common prefix, it can be determined whether the file name to be accessed is the same as the i-th file name.
  • the data query device can directly determine the directory entry corresponding to the file name to be accessed according to the ith file name, and obtain the data of the file to be accessed according to the directory entry corresponding to the file name to be accessed.
  • the data querying device re-determines the third common prefix, the candidate directory block, and the i-th file name, and re-determines the candidate directory block according to the re-determined third common prefix. And the re-determined i-th file name, re-execute step B3 above until the target directory block is determined.
  • the above-mentioned re-determined i-th file is named as the file name of the fourth preset position in the j-th directory block among the n directory blocks.
  • the data query device Similar to the method for determining the common prefix by the data query device in the scenario in which the storage mode of the n directory blocks is sequentially stored between the blocks, in the scenario where the storage manner of the n directory blocks is stored in the complete binary tree mode between the blocks, the data query device Also determine the common prefix.
  • the data query device determines a third common prefix.
  • the data query device re-determines the third common prefix by: when the feature value of the file name to be accessed is greater than the feature of the i-th file name
  • the data querying device updates the current first target prefix to a prefix common to the file name to be accessed and the i-th file name, and minimizes the length of the updated first target prefix and the current second target prefix. Determined as the third common prefix that is redefined.
  • the data query device updates the current second target prefix to a prefix common between the file name to be accessed and the i-th file name, and the current The smallest one of the target prefix and the updated second target prefix is determined as the re-determined third common prefix.
  • the initial value of the length of the first target prefix and the length of the second target prefix are both zero, and the length of the first target prefix and the length of the second target prefix follow the feature value of the file name to be accessed and the ith file.
  • the size relationship of the eigenvalues of the name changes.
  • the data querying device re-determines the candidate directory block by: if the feature value of the file name to be accessed is greater than the ith file And the data querying device determines that the candidate directory block is a directory block to which the i-th file name belongs; if the feature value of the file name to be accessed is smaller than the feature value of the i-th file name, the data query device determines the candidate directory block. Same as the current candidate directory block.
  • the data query device searches for the same file name as the file name to be accessed in the target directory block.
  • the process of determining the target directory block by the data querying device is: 1.
  • the initial value of the candidate directory block is null, and the third common prefix is initial. If the value is null, the data query device compares the to-be-accessed file name CORE and the 0th file name DASH character by character from the first character. Since the feature value of the file name CORE to be accessed is smaller than the feature value of the DASH, the data query device needs to further compare the to-be-accessed.
  • the third common prefix re-determined by the data query device is still empty.
  • the data query device compares the file name CORE and the first file to be accessed character by character from the first character. Name BABY, because the feature value of the file name CORE to be accessed is greater than the feature value of the BABY, the data query device changes the candidate directory block to the directory block to which the BABY belongs (ie, the directory block 2). In addition, the query device needs to further compare the file to be accessed. The name CORE and the 4th (2*1+2) file name CAFE. The third common prefix re-determined by the data query device is still empty. 3. The data query device compares the file name CORE to be accessed and the file name CAFE from the first character character by character.
  • the data query device Since the feature value of the file name CORE to be accessed is greater than the feature value of the CAFE, the data query device changes the candidate directory block to a directory block to which the CAFE belongs (ie, the directory block 5); in addition, the data query device needs to further compare the file name to be accessed with the first 10 (2*4+2) file names; however, the 10th file name does not exist; therefore, the data query device determines that the directory block to which the CAFE belongs is the target directory block, that is, the target directory block is determined to be the directory block 5.
  • the data query device in the application embodiment determines that the pseudo code of the target directory block can be the following code:
  • the headprefix in the code is equivalent to the first target prefix, and the endprefix is equivalent to the second target prefix.
  • the data query device compares the file name fname to be accessed and the i-th file name dirent0name[i] character by character from the first character after the common prefix, which effectively improves the rate of querying the target directory block.
  • the data search method provided by the embodiment of the present application can effectively improve the rate of querying the file to be accessed, whether the storage mode of the n directory blocks is sequential storage between the blocks or the blocks are stored in a complete binary tree manner.
  • the embodiment of the present application provides a data query device, which is used to execute the steps performed by the data query device in the above data query method.
  • the data query device provided by the embodiment of the present application may include a module corresponding to the corresponding step.
  • the embodiment of the present application may divide the function module by using the data query device according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and may be further divided in actual implementation.
  • FIG. 14 shows a possible structural diagram of the data querying apparatus involved in the above embodiment.
  • the data querying device 140 includes a processing unit 1400 and an obtaining unit 1401.
  • the processing unit 1400 is configured to support the data querying apparatus 10 to perform S901, S902, S903, S904, etc. in the above embodiments, and/or other processes for the techniques described herein.
  • the obtaining unit 1401 is for supporting the data querying apparatus 10 to execute S900, S905, etc. in the above embodiments, and/or other processes for the techniques described herein.
  • the data querying apparatus 140 provided by the embodiment of the present application includes but is not limited to the foregoing modules.
  • the data querying apparatus may further include a storage unit 1402.
  • the storage unit 1402 can be used to store program codes and data of the data query device 140.
  • the data query device 150 includes a processing module 1500 and a communication module 1501.
  • the processing module 1500 is for controlling management of the actions of the data query device 150, for example, performing the steps performed by the processing unit 1400 described above, and/or other processes for performing the techniques described herein.
  • the communication module 1501 is configured to support interaction between the data query device 150 and other devices, for example, to perform the steps performed by the acquisition unit 1401 described above.
  • the data querying device 150 may further include a storage module 1502 for storing program codes and data of the data querying device 150, for example, storing the content saved by the storage unit 1402.
  • the processing module 1500 can be a processor or a controller, for example, a CPU, a general-purpose processor, a DSP, an Application-Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA). Or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the communication module 1501 may be a transceiver, an RF circuit, a communication interface, or the like.
  • the storage module 1502 can be a memory.
  • the processing module 1500 can be the processor 41 of FIG. 4
  • the communication module 1501 can be the communication interface 40 of FIG. 4
  • the storage module 1502 can be the storage medium 42 of FIG.
  • the data query device 140 and the data query device 150 can perform the data query method shown in FIG. 9 above.
  • the data query device 140 and the data query device 150 can be specifically terminals.
  • the application also provides a terminal, the terminal comprising: one or more processors, a memory, and a communication interface.
  • the memory, communication interface is coupled to one or more processors; the memory is for storing computer program code, and the computer program code includes instructions that, when executed by one or more processors, perform a data query method of an embodiment of the present application.
  • the terminals here can be video display devices, smart phones, laptops, and other devices that can process video or play video.
  • Another embodiment of the present application also provides a computer readable storage medium including one or more program codes, the one or more programs including instructions, when a processor in a terminal is executing the program code At the time, the terminal performs a data query method as shown in FIG.
  • a computer program product comprising computer executable instructions stored in a computer readable storage medium; at least one processor of the terminal The read storage medium reads the computer execution instructions, and the at least one processor executes the computer execution instructions to cause the terminal to perform the steps of executing the data query device in the data query method shown in FIG.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • a software program it may occur in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)).
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used.
  • the combination may be integrated into another device, or some features may be ignored or not performed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a readable storage medium.
  • the technical solution of the embodiments of the present application may be embodied in the form of a software product in the form of a software product in essence or in the form of a contribution to the prior art, and the software product is stored in a storage medium.
  • a number of instructions are included to cause a device (which may be a microcontroller, chip, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据查询方法及装置,涉及计算机技术领域,解决了查询待访问文件名效率较低的问题。该方法包括:确定包括m个目录项和m个文件名的目标目录块,m个目录项和m个文件名一一对应且按照预设规则顺序排列;根据二分查找算法和目标目录块,确定当前第一集合和当前第二集合,当前第一集合包括m个文件名中连续的x个文件名,当前第二集合包括x个文件名、第一文件名以及第二文件名,m≥x≥1;确定待访问文件名与当前第二集合中的文件名之间的第一公共前缀;从第一公共前缀之后的首位字符起,逐字符对比待访问文件名与第三文件名;若待访问文件名与第三文件名相同,则根据与第三文件名对应的目录项获取待访问文件的数据。

Description

一种数据查询方法及装置 技术领域
本申请实施例涉及计算机技术领域,尤其涉及一种数据查询方法及装置。
背景技术
文件系统通常采用树形结构组织文件的拓扑关系,这里,文件仅涉及目录文件和普通文件。一般的,树形结构中的叶节点表示普通文件,除叶节点之外的其他节点表示目录文件。目录文件包括多个目录项,每个目录项均包括文件名、文件类型以及索引(inode)号,计算机根据索引号所标识的索引能够获取到文件的数据。计算机在获取某一文件(以文件A为例)时,需要在目录文件中查询文件名,以获取与文件A的文件名对应的目录项,进而根据获取到的目录项获取文件A的数据。
目前,一些文件系统(如闪存友好文件系统(Flash Friendly File System,F2FS)、第四代扩展文件系统(Fourth Extended File System,EXT4等)中目录的结构为包括多级哈希表的哈希(hash)树,每一级哈希表均包括多个哈希值、与每个哈希值对应的文件名以及文件的索引号;另一些文件系统(如新技术文件系统(New technology file system,Ntfs)、B树文件系统(B-tree file system,Btrfs)中目录的结构为n(n≥1)阶B+树。
在文件系统只读的应用场景中:若该文件系统中目录的结构为包括多级哈希表的哈希树,计算机在获取待访问文件的数据时,需逐级查询哈希表,且在查询的每一哈希表中均先遍历哈希值再匹配文件名,在哈希表的级别较高的情况下,在哈希表中查询待访问文件名的效率较低;此外,哈希树包括有大量的哈希值,导致存储空间的有效利用率较低,而且在哈希表的级别较高的情况下,可能存在高级别哈希表未写满的情况,进一步降低了存储空间的有效利用率;若该文件系统中目录的结构为n阶B+树,计算机在获取待访问文件的数据时,该计算机从最小关键字开始顺序查询,或者从根节点开始随机查询,查询待访问文件名的效率比较低;此外,n阶B+树的叶子节点中的关键字出现在中间节点中,降低了存储空间的有效利用率。
综上,在文件系统只读的应用场景中,基于现有的目录结构,计算机查询待访问文件名的效率较低,且计算机的存储空间有效利用率较低。
发明内容
本申请实施例提供一种数据查询方法及装置,能够解决计算机查询待访问文件名的效率较低,且计算机的存储空间有效利用率较低的问题。
为达到上述目的,本申请实施例采用如下技术方案:
第一方面,提供一种数据查询方法,该数据查询方法应用于包括n(n≥1)个目录块的只读文件系统中,每个目录块包括目录项区域和文件名区域。具体的,该数据查询方法为:数据查询装置从上述n个目录块中确定目标目录块,该目标目录块的目录项区域包括m个目录项,目标目录块的文件名区域包括m个文件名,m个目录项与m个文件名一一对应,m个目录项和m个文件名均按照预设规则顺序排列,待访问文 件名位于文件名范围中,文件名范围是由目标目录块的首个文件名与目标目录块的最后一个文件名组成的范围,m≥1;在确定出目标目录块后,数据查询装置依次执行步骤A1、步骤B1以及步骤C1,这里的步骤A1为:根据二分查找算法和目标目录块,确定当前第一集合和当前第二集合,当前第一集合包括m个文件名中连续的x个文件名,当前第二集合包括x(m≥x≥1)个文件名、第一文件名以及第二文件名,第一文件名为排列于x个文件名中的首个文件名之前且与x个文件名中的首个文件名相邻的文件名,第二文件名为排列于x个文件名中的最后一个文件名之后且与x个文件名中的最后一个文件名相邻的文件名;步骤B1为:确定待访问文件名与当前第二集合中的文件名之间的第一公共前缀;步骤C1为:从第一公共前缀之后的首位字符起,逐字符对比待访问文件名与第三文件名,该第三文件名为当前第一集合中第一预设位置的文件名;在执行完步骤C1之后,若待访问文件名与第三文件名相同,则数据查询装置根据与第三文件名对应的目录项获取待访问文件的数据。
本申请中的数据查询装置确定了待访问文件名与当前第二集合中的文件名之间的第一公共前缀,由于当前第二集合覆盖第一集合,因此,当前第一集合中的所有文件名与待访问文件名之间也存在第一公共前缀,这样,数据查询装置直接从第一公共前缀之后的首位字符起,逐字符对比待访问文件名与第三文件名即可,有效的提高了查询待访问文件的速率。
此外,与现有的目录树结构相比,本申请实施例中的目录结构仅存储有目录项与文件名,并未存储与文件名或目录项相关的其他信息,有效的提高了存储空间的利用率。
可选的,在本申请的一种可能的实现方式中,若待访问文件名与第三文件名不同,则数据查询装置根据二分查找算法、当前第一集合和当前第二集合,重新确定第一集合和第二集合,并根据重新确定的第一集合和重新确定的第二集合,执行上述步骤B1和上述步骤C1,直到获取到待访问文件的数据或确定目标目录块未包括待访问文件名。
结合二分查找算法的定义可知,若待访问文件名与第三文件名不同,则数据查询装置进一步缩小查询范围(即重新确定第一集合),并重新确定第一公共前缀。
可选的,在本申请的另一种可能的实现方式中,上述预设规则为字典序顺序,这样,上述“若待访问文件名与第三文件名不同,则根据二分查找算法、当前第一集合和当前第二集合,重新确定第一集合和第二集合”的方法为:若待访问文件名的特征值小于第三文件名的特征值,确定重新确定的第一集合包括当前第一集合中位于第三文件名之前的所有文件名,重新确定的第二集合包括第一文件名、当前第一集合中位于第三文件名之前的所有文件名以及第三文件名;或者,若待访问文件名的特征值大于第三文件名的特征值,确定重新确定的第一集合包括当前第一集合中位于第三文件名之后的所有文件名,重新确定的第二集合包括第三文件名、当前第一集合中位于第三文件名之后的所有文件名以及第二文件名。
可选的,在本申请的另一种可能的实现方式中,上述“确定待访问文件名与当前第二集合中的文件名之间的第一公共前缀”的方法为:确定待访问文件名与第一文件名之间共有的第一前缀;确定待访问文件名与第二文件名之间共有的第二前缀;将第一前缀与第二前缀中长度最小的一个确定为第一公共前缀。
本申请实施例中当前第二集合中的所有文件名组成的文件名范围大于当前第一集合中的所有文件名组成的文件名范围,由于每个目录块中的所有文件名均按照预设规则顺序排列,因此,数据查询装置通过确定待访问文件名与第一文件名之间共有的第一前缀,以及确定待访问文件名与第二文件名之间共有的第二前缀,可将第一前缀与第二前缀中长度最小的一个确定为第一公共前缀。
可选的,在本申请的另一种可能的实现方式中,上述“数据查询装置从n个目录块中确定目标目录块”的方法为:数据查询装置依次执行步骤A2、B2、C2;其中,步骤A2为:根据二分查找算法和n个目录块,确定当前第三集合和当前第四集合;其中,当前第三集合包括p个文件名,p个文件名包括p个目录块中每个目录块中第二预设位置的文件名,且当前第三集合中的文件名按照预设规则顺序排列,p个目录块为n个目录块中连续的目录块,当前第四集合包括p个文件名、第四文件名以及第五文件名,第四文件名为排列于p个文件名中的首个文件名之前且与p个文件名中的首个文件名相邻的文件名,第五文件名为排列于p个文件名中的最后一个文件名之后且与p个文件名中的最后一个文件名相邻的文件名,1≤p≤n;步骤B2为:确定待访问文件名与当前第四集合中的文件名之间的第二公共前缀;步骤C2为:从第二公共前缀之后的首位字符起,逐字符对比待访问文件名与第六文件名;其中,第六文件名为当前第三集合中第三预设位置的文件名;这样,在执行步骤C2之后,若待访问文件名与第六文件名相同,则数据查询装置确定第六文件名归属的目录块为目标目录块。
当n个目录块的存储方式为块间顺序存储时,数据查询装置确定目标目录块的方法与该数据查询装置查询待访问文件名的方法类似。
可选的,在本申请的另一种可能的实现方式中,在待访问文件名与第六文件名不同的情况下,当2≤p≤n时,数据查询装置根据当前第三集合、当前第四集合和二分查找算法,重新确定第三集合和第四集合,并根据重新确定的第三集合和重新确定的第四集合,执行上述步骤B2和上述步骤C2;当p=1时,数据查询装置根据当前第三集合包括的文件名确定目标目录块。
可选的,在本申请的另一种可能的实现方式中,在上述预设规则为字典序顺序,对于每个目录块,第二预设位置的文件名为该目录块的首个文件名的情况下,上述“当p=1时,数据查询装置根据当前第三集合包括的文件名确定目标目录块”的方法为:若待访问文件名的特征值大于当前第三集合中的文件名的特征值,则数据查询装置将当前第三集合中的文件名归属的目录块确定为目标目录块;或者,若待访问文件名的特征值小于当前第三集合中的文件名的特征值,则数据查询装置将位于当前第三集合中的文件名之前且与当前第三集合中的文件名相邻的文件名归属的目录块确定为目标目录块。
可选的,在本申请的另一种可能的实现方式中,在预设规则为字典序顺序的情况下,上述“根据当前第三集合、当前第四集合和二分查找算法,重新确定第三集合和第四集合”的方法为:若待访问文件名的特征值小于第六文件名的特征值,确定重新确定的第三集合包括当前第三集合中位于第六文件名之前的所有文件名,重新确定的第四集合包括第四文件名、当前第三集合中位于第六文件名之前的所有文件名以及第六文件名;或者,若待访问文件名的特征值大于第六文件名的特征值,确定重新确定 的第三集合包括当前第三集合中位于第六文件名之后的所有文件名,重新确定的第四集合包括第六文件名、当前第三集合中位于第六文件名之后的所有文件名以及第五文件名。
可选的,在本申请的另一种可能的实现方式中,在上述预设规则为字典序顺序,对于每个目录块,第二预设位置的文件名为该目录块的首个文件名的情况下,若第六文件名为当前第三集合中的首个文件名,待访问文件名的特征值小于第六文件名的特征值,则将位于第六文件名之前且与第六文件名相邻的文件名归属的目录块确定为目标目录块;若第六文件名为当前第三集合中的最后一个文件名,待访问文件名的特征值大于第六文件名的特征值,则将第六文件名归属的目录块确定为目标目录块。
可选的,在本申请的另一种可能的实现方式中,上述“确定待访问文件名与当前第四集合中的文件名之间的第二公共前缀”的方法为:确定待访问文件名与第四文件名之间共有的第三前缀;确定待访问文件名与第五文件名之间共有的第四前缀;将第三前缀与第四前缀中长度最小的一个确定为第二公共前缀。
可选的,在本申请的另一种可能的实现方式中,在上述n个目录块按照预设规则顺序排列,并采用完全二叉树方式存储的场景中,上述“数据查询装置从n个目录块中确定目标目录块”的方法为:数据查询装置依次执行步骤A3和步骤B3,其中步骤A3为:确定当前候选目录块和当前第三公共前缀;步骤B3为:从当前第三公共前缀之后的首位字符起,逐字符对比待访问文件名与第i个文件名,第i个文件名为n个目录块中第i个目录块中第四预设位置的文件名,0≤i<n;这样,在执行步骤B3后,若待访问文件名与第i个文件名相同,则数据查询装置重新确定候选目录块为第i个文件名归属的目录块,并将重新确定的候选目录块确定为目标目录块。
可选的,在本申请的另一种可能的实现方式中,若待访问文件名与第i个文件名不同,则数据查询装置重新确定第三公共前缀、候选目录块以及第i个文件名,重新确定的第i个文件名为n个目录块中第j个目录块中第四预设位置的文件名;其中,若待访问文件名的特征值大于第i个文件名的特征值,j=2i+2;若待访问文件名的特征值小于第i个文件名的特征值,j=2i+1,0≤i<j<n;数据查询装置根据重新确定的第三公共前缀、重新确定的候选目录块以及重新确定的第i个文件名,重新执行步骤B3,直到确定出目标目录块。
可选的,在本申请的另一种可能的实现方式中,在上述预设规则顺序为字典序,第四预设位置的文件名为对应目录块中的首个文件名的情况下,上述“若待访问文件名与第i个文件名不同,则重新确定候选目录块”的方法为:若待访问文件名的特征值大于第i个文件名的特征值,则确定重新确定的候选目录块为第i个文件名归属的目录块;若待访问文件名的特征值小于第i个文件名的特征值,则确定重新确定的候选目录块为当前候选目录块。
可选的,在本申请的另一种可能的实现方式中,在上述预设规则顺序为字典序,第四预设位置的文件名为对应目录块中的首个文件名的情况下,上述“若待访问文件名与第i个文件名不同,则重新确定第三公共前缀”的方法为:当待访问文件名的特征值大于第i个文件名的特征值时,将当前第一目标前缀更新为待访问文件名与第i个文件名之间共有的前缀;将更新后的第一目标前缀与当前第二目标前缀中长度最小 的一个确定为重新确定的第三公共前缀;或者,当待访问文件名的特征值小于第i个文件名的特征值时,将当前第二目标前缀更新为待访问文件名与第i个文件名之间共有的前缀;将当前第一目标前缀与更新后的第二目标前缀中长度最小的一个确定为重新确定的第三公共前缀;其中,第一目标前缀的长度和第二目标前缀的长度的初始值均为零,且第一目标前缀的长度和第二目标前缀的长度随着待访问文件名的特征值与第i个文件名的特征值的大小关系发生变化。
第二方面,提供一种只读文件系统,该只读文件系统的对象包括目录文件,目录文件由n个目录块组成,每个目录块均包括目录项区域和文件名区域,目录项区域包括至少一个目录项,文件名区域包括至少一个文件名。对应同一目录块而言,该目录块中目录项的数量与文件名的数量相同,且该目录块中的所有目录项以及所有文件名均按照预设规则顺序排列。
可选的,在本申请的一种可能的实现方式中,上述至少一个目录项中的每个目录项均包括索引号、文件类型和与该目录项对应的文件名在所归属的目录块的偏移量;上述文件名区域与目录项区域相邻,且文件名区域位于目录项区域之后。
第三方面,提供一种数据查询装置,该数据查询装置具备如上述第二方面及其任意一种可能的实现方式所述的只读文件系统。该数据查询装置包括处理单元和获取单元。
具体的,上述处理单元,用于从只读文件系统的n个目录块中确定目标目录块,目标目录块的目录项区域包括m个目录项,目标目录块的文件名区域包括m个文件名,m个目录项与m个文件名一一对应,m个目录项和m个文件名均按照预设规则顺序排列,待访问文件名位于文件名范围中,文件名范围是由目标目录块的首个文件名与目标目录块的最后一个文件名组成的范围,m≥1。上述处理单元,还用于执行步骤A1、步骤B1以及步骤C1;其中,步骤A1为:根据二分查找算法和目标目录块,确定当前第一集合和当前第二集合,当前第一集合包括m个文件名中连续的x个文件名,当前第二集合包括x个文件名、第一文件名以及第二文件名,第一文件名为排列于x个文件名中的首个文件名之前且与x个文件名中的首个文件名相邻的文件名,第二文件名为排列于x个文件名中的最后一个文件名之后且与x个文件名中的最后一个文件名相邻的文件名,m≥x≥1;步骤B1为:确定待访问文件名与当前第二集合中的文件名之间的第一公共前缀;步骤C1为:从第一公共前缀之后的首位字符起,逐字符对比待访问文件名与第三文件名;其中,第三文件名为当前第一集合中第一预设位置的文件名。上述获取单元,用于若上述处理单元判断出待访问文件名与第三文件名相同,则根据与第三文件名对应的目录项获取待访问文件的数据。
可选的,在本申请的一种可能的实现方式中,上述处理单元,还用于若待访问文件名与第三文件名不同,则根据二分查找算法、当前第一集合和当前第二集合,重新确定第一集合和第二集合,并根据重新确定的第一集合和重新确定的第二集合,执行步骤B1和步骤C1,直到上述获取单元获取到待访问文件的数据或处理单元确定目标目录块未包括待访问文件名。
可选的,在本申请的另一种可能的实现方式中,上述预设规则为字典序顺序,上述处理单元具体用于:若待访问文件名的特征值小于第三文件名的特征值,确定重新 确定的第一集合包括当前第一集合中位于第三文件名之前的所有文件名,重新确定的第二集合包括第一文件名、当前第一集合中位于第三文件名之前的所有文件名以及第三文件名;或者,若待访问文件名的特征值大于第三文件名的特征值,确定重新确定的第一集合包括当前第一集合中位于第三文件名之后的所有文件名,重新确定的第二集合包括第三文件名、当前第一集合中位于第三文件名之后的所有文件名以及第二文件名。
可选的,在本申请的另一种可能的实现方式中,上述处理单元具体用于:确定待访问文件名与第一文件名之间共有的第一前缀;确定待访问文件名与第二文件名之间共有的第二前缀;将第一前缀与第二前缀中长度最小的一个确定为第一公共前缀。
可选的,在本申请的另一种可能的实现方式中,上述处理单元还用于执行步骤A2、步骤B2以及步骤C2;其中,步骤A2为:根据二分查找算法和n个目录块,确定当前第三集合和当前第四集合;其中,当前第三集合包括p个文件名,p个文件名包括p个目录块中每个目录块中第二预设位置的文件名,且当前第三集合中的文件名按照预设规则顺序排列,p个目录块为n个目录块中连续的目录块,当前第四集合包括p个文件名、第四文件名以及第五文件名,第四文件名为排列于p个文件名中的首个文件名之前且与p个文件名中的首个文件名相邻的文件名,第五文件名为排列于p个文件名中的最后一个文件名之后且与p个文件名中的最后一个文件名相邻的文件名,1≤p≤n;步骤B2为:确定待访问文件名与当前第四集合中的文件名之间的第二公共前缀;步骤C2为:从第二公共前缀之后的首位字符起,逐字符对比待访问文件名与第六文件名;其中,第六文件名为当前第三集合中第三预设位置的文件名。上述处理单元还用于若待访问文件名与第六文件名相同,则确定第六文件名归属的目录块为目标目录块。
可选的,在本申请的另一种可能的实现方式中,上述处理单元具体用于:当2≤p≤n时,根据当前第三集合、当前第四集合和二分查找算法,重新确定第三集合和第四集合,并根据重新确定的第三集合和重新确定的第四集合,执行步骤B2和步骤C2;当p=1时,根据当前第三集合包括的文件名确定目标目录块。
可选的,在本申请的另一种可能的实现方式中,上述预设规则为字典序顺序,对于每个目录块,第二预设位置的文件名为该目录块的首个文件名;当p=1时,上述处理单元具体用于:若待访问文件名的特征值大于当前第三集合中的文件名的特征值,则将当前第三集合中的文件名归属的目录块确定为目标目录块;或者,若待访问文件名的特征值小于当前第三集合中的文件名的特征值,则将位于当前第三集合中的文件名之前且与当前第三集合中的文件名相邻的文件名归属的目录块确定为目标目录块。
可选的,在本申请的另一种可能的实现方式中,上述预设规则为字典序顺序;上述处理单元具体用于:若待访问文件名的特征值小于第六文件名的特征值,确定重新确定的第三集合包括当前第三集合中位于第六文件名之前的所有文件名,重新确定的第四集合包括第四文件名、当前第三集合中位于第六文件名之前的所有文件名以及第六文件名;或者,若待访问文件名的特征值大于第六文件名的特征值,确定重新确定的第三集合包括当前第三集合中位于第六文件名之后的所有文件名,重新确定的第四集合包括第六文件名、当前第三集合中位于第六文件名之后的所有文件名以及第五文 件名。
可选的,在本申请的另一种可能的实现方式中,上述预设规则为字典序顺序,对于每个目录块,第二预设位置的文件名为该目录块的首个文件名,上述处理单元还用于:若第六文件名为当前第三集合中的首个文件名,待访问文件名的特征值小于第六文件名的特征值,则将位于第六文件名之前且与第六文件名相邻的文件名归属的目录块确定为目标目录块;若第六文件名为当前第三集合中的最后一个文件名,待访问文件名的特征值大于第六文件名的特征值,则将第六文件名归属的目录块确定为目标目录块。
可选的,在本申请的另一种可能的实现方式中,上述处理单元具体用于:确定待访问文件名与第四文件名之间共有的第三前缀;确定待访问文件名与第五文件名之间共有的第四前缀;将第三前缀与第四前缀中长度最小的一个确定为第二公共前缀。
可选的,在本申请的另一种可能的实现方式中,上述n个目录块按照预设规则顺序排列,并采用完全二叉树方式存储。相应的,上述处理单元,还用于执行步骤A3和步骤B3;其中,步骤A3为:确定当前候选目录块和当前第三公共前缀;步骤B3为:从当前第三公共前缀之后的首位字符起,逐字符对比待访问文件名与第i个文件名,第i个文件名为n个目录块中第i个目录块中第四预设位置的文件名,0≤i<n。上述处理单元,还用于若待访问文件名与第i个文件名相同,则重新确定候选目录块为第i个文件名归属的目录块,并将重新确定的候选目录块确定为目标目录块。
可选的,在本申请的另一种可能的实现方式中,上述处理单元,还用于若待访问文件名与第i个文件名不同,则重新确定第三公共前缀、候选目录块以及第i个文件名,重新确定的第i个文件名为n个目录块中第j个目录块中第四预设位置的文件名;其中,若待访问文件名的特征值大于第i个文件名的特征值,j=2i+2;若待访问文件名的特征值小于第i个文件名的特征值,j=2i+1,0≤i<j<n。上述处理单元,还用于根据重新确定的第三公共前缀、重新确定的候选目录块以及重新确定的第i个文件名,重新执行步骤B3,直到确定出目标目录块。
可选的,在本申请的另一种可能的实现方式中,上述预设规则顺序为字典序,第四预设位置的文件名为对应目录块中的首个文件名。相应的,上述处理单元具体用于:若待访问文件名的特征值大于第i个文件名的特征值,则确定重新确定的候选目录块为第i个文件名归属的目录块;若待访问文件名的特征值小于第i个文件名的特征值,则确定重新确定的候选目录块为当前候选目录块。
可选的,在本申请的另一种可能的实现方式中,上述预设规则顺序为字典序,第四预设位置的文件名为对应目录块中的首个文件名。相应的,上述处理单元具体用于:当待访问文件名的特征值大于第i个文件名的特征值时,将当前第一目标前缀更新为待访问文件名与第i个文件名之间共有的前缀;将更新后的第一目标前缀与当前第二目标前缀中长度最小的一个确定为重新确定的第三公共前缀;或者,当待访问文件名的特征值小于第i个文件名的特征值时,将当前第二目标前缀更新为待访问文件名与第i个文件名之间共有的前缀;将当前第一目标前缀与更新后的第二目标前缀中长度最小的一个确定为重新确定的第三公共前缀;其中,第一目标前缀的长度和第二目标前缀的长度的初始值均为零,且第一目标前缀的长度和第二目标前缀的长度随着待访 问文件名的特征值与第i个文件名的特征值的大小关系发生变化。
第四方面,提供一种终端,该终端包括:一个或多个处理器、存储器、通信接口。该存储器、通信接口与一个或多个处理器耦合;存储器用于存储计算机程序代码,计算机程序代码包括指令,当一个或多个处理器执行指令时,终端执行如上述第一方面及其任意一种可能的实现方式所述的数据查询方法。
第五方面,提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在上述第四方面所述的终端上运行时,使得所述终端执行如上述第一方面及其任意一种可能的实现方式所述的数据查询方法。
第六方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在上述第四方面所述的终端上运行时,使得所述终端执行如上述第一方面及其任意一种可能的实现方式所述的数据查询方法。
在本申请中,上述数据查询装置的名字对设备或功能模块本身不构成限定,在实际实现中,这些设备或功能模块可以以其他名称出现。只要各个设备或功能模块的功能和本申请类似,属于本申请权利要求及其等同技术的范围之内。
本申请中第二方面到第六方面,及其各种实现方式的具体描述,可以参考第一方面及其各种实现方式中的详细描述;并且,第二方面到第六方面,及其各种实现方式的有益效果,可以参考第一方面及其各种实现方式中的有益效果分析,此处不再赘述。
本申请的这些方面或其他方面在以下的描述中会更加简明易懂。
附图说明
图1为查找二分算法的查询流程示意图;
图2为现有技术中Linux操作系统中的分区结构示意图;
图3为F2FS文件系统的目录结构示意图;
图4为本申请实施例中数据查询装置的硬件结构示意图;
图5为本申请实施例中数据查询装置的外部存储器42~2的分区结构示意图;
图6为本申请实施例中数据查询装置的目录块的结构分布示意图;
图7为本申请实施例中数据查询装置的目录块的排列结构示意图一;
图8为本申请实施例中数据查询装置的目录块的排列结构示意图二;
图9为本申请实施例提供的数据查询方法的流程示意图;
图10为本申请实施例中数据查询装置确定目标目录块的流程示意图一;
图11为本申请实施例中数据查询装置确定目标目录块的流程示意图二;
图12为本申请实施例中第二集合的结构示意图;
图13为本申请实施例中数据查询装置查询待访问文件名的流程示意图一;
图14为本申请实施例中数据查询装置的结构示意图一;
图15为本申请实施例中数据查询装置的结构示意图二。
具体实施方式
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释 为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
为了便于理解本申请实施例,这里先解释本申请实施例涉及到的相关要素。
二分查找:又称为折半查找,其基本思想是:在m个升序排列的元素(如A1、A2、……、Am)中查询数据x,取
Figure PCTCN2018075300-appb-000001
与x比较,如果二者相等,则查询终止;如果x小于
Figure PCTCN2018075300-appb-000002
则在小于
Figure PCTCN2018075300-appb-000003
的所有元素中进一步查询,直至找到与x相等的元素;如果x大于
Figure PCTCN2018075300-appb-000004
则在大于
Figure PCTCN2018075300-appb-000005
的所有元素中进一步查询,直至找到与x相等的元素。其中,
Figure PCTCN2018075300-appb-000006
用于表示下取整。
示例性的,如图1所示,7个升序排列的数值为{1、4、7、8、10、15、20},若待查询的数值为“7”,则查询过程为:
(1)、比较{1、4、7、8、10、15、20}中第
Figure PCTCN2018075300-appb-000007
个数值“8”与“7”的大小。
由于“7”小于“8”,因此继续在小于“8”的所有数值{1、4、7}中查询。
(2)、比较{1、4、7}中的第
Figure PCTCN2018075300-appb-000008
个数值“4”与“7”的大小。
由于“7”大于“4”,进一步继续在大于“4”的数值{7}中查询。
(3)、{7}中的数值“7”与待查询的“7”相等,查询终止。
本申请实施例用于查询文件名,因此,在本申请实施例中上述m个元素为m个文件名。不同文件名之间的大小可以通过文件名的特征值(如在按照字典序排列文件名后,某一文件名的排列顺序等)确定。
文件系统提供了一个结构化的数据存储和组织形式,其采用树形结构组织文件的拓扑关系,为用户访问和查询文件提供了方便。为了管理目录下的子文件和子目录,目录文件需要保存该目录下所有子文件的名字和索引(inode)号。其中,文件名是用户可见的,用户通过文件名管理和访问文件;inode包括文件的基础信息(如文件大小、文件创建时间、文件修改时间等)和多个指向存储有该文件的数据的各个数据块的指针信息,计算机根据inode号能够获取到相应的inode。当应用程序需要读取某一文件时,根据该文件的inode所包括的指针信息,即可确定出存储有该文件数据的各个数据块,进而从确定出的数据块中获取该文件的数据。由于文件系统会记录文件的文件名和索引号,因此,在文件系统的应用场景中,用户无需关心文件的数据存储在哪些数据块上,只需要记住文件所属的目录和文件名即可完成文件数据的访问。
一般的,存储介质所提供的存储空间包括多个分区(partition),所述多个分区均被挂载在一个或多个文件系统。每个文件的数据均被存入到存储介质的某个分区中。存储介质的每个分区被划分为多个块(block)。对于同一文件系统,每个block的大小相同。典型的,每个block的大小是1024字节(byte)或者4096字节。为了便于描述,本申请实施例将存储有目录文件的数据的block称为目录块,将存储有其他类型文件的数据的block称为数据块。
如图2所示,Linux操作系统中,存储介质的每个分区包含有超级区(Super block)、索引区和数据区。超级区存储有文件系统的相关信息,例如:文件系统的类型,block的数目、block的大小等信息。索引区包括k(k≥1)个inode,每个inode均包括文件的基础信息(如文件大小、文件创建时间、文件修改时间等)和多个指向存储有该文 件的数据的各个数据块的指针信息。数据区包括普通文件和目录文件的数据,其中,目录文件由n(n≥1)个目录块组成,每个目录块均包括多个目录项,一个目录项与一个文件对应,每个目录项均包括一个文件的索引(inode)号、该文件的文件名以及该文件的文件类型等。
目前,一些文件系统(如F2FS文件系统、EXT4文件系统等)中目录的结构为包括多级哈希表的哈希树,每一级哈希表均包括多个哈希值、与每个哈希值对应的文件名以及文件的索引号。
以F2FS文件系统为例,F2FS文件系统中目录的结构为包括多级哈希表的哈希树,每一级都有一个使用专用数字的哈希桶(bucket)的哈希表。每个哈希bucket均为目录项数组,哈希bucket中的每个目录项包括一个哈希值、与该哈希值对应的文件名以及文件的索引号。表1示出了F2FS文件系统中目录的结构,该结构为包括N级哈希表的哈希树。表1中A表示bucket,B表示目录块,N表示哈希最大级,A(2B)表示一个哈希bucket包括两个目录块,A(4B)表示一个哈希bucket包括四个目录块。
表1
第0级 A(2B)
第1级 A(2B)-A(2B)
第2级 A(2B)-A(2B)-A(2B)-A(2B)
…… ……
第N/2级 A(2B)-A(2B)-A(2B)-……A(2B)
第N/2+1级 A(4B)-A(4B)-A(4B)-……A(4B)
…… ……
第N级 A(4B)-A(4B)-A(4B)-A(4B)-……A(4B)
结合表1可以看出:F2FS文件系统中目录的结构中,从第0级哈希表开始到第N/2级哈希表,每一级哈希表中的每个哈希bucket均包括两个目录块;从第N/2+1级哈希表开始到第N级哈希表,每一级哈希表中的每个哈希bucket均包括四个目录块。第i(i<N/2)级哈希表包括2^i个哈希bucket,第j(j≥N/2)级哈希表包括2^(N/2-1)个哈希bucket。
图3示出了上述表1所示的哈希树。第0级哈希表包括一个哈希bucket:Bucket 0,Bucket 0包括目录块0和目录块1两个目录块。其他级哈希表的内容与第0级哈希表、第1级哈希表均类似,这里不再进行详细赘述。
基于上述哈希树结构,计算机在F2FS文件系统的目录中查询待访问文件名时,首先计算该待访问文件名的哈希值,然后在第0级哈希表中扫描哈希值查询包括该待访问文件名和该待访问文件的索引号的目录项。如果没有找到,计算机在第1级哈希表中扫描。也就是说,若计算机在上一级哈希表中未找到包括待访问文件名的目录项,则计算机以递增的方式扫描下一级哈希表。在每一级哈希表中,计算机仅需要扫描一个哈希bucket,该哈希bucket的编号是由待访问文件名的哈希值与该级别中的哈希buckets数量的相除取余得到的。
在F2FS文件系统只读的应用场景中,计算机查询待访问文件名时,该计算机需要线性搜索目录项。在哈希表的级别较高、目录项的数量较大的情况下,计算机需要 搜索很多个目录块,查询效率较低。
此外,上述哈希树包括有大量的哈希值,导致存储空间的有效利用率较低。随着级别的增加,高级别哈希表中哈希bucket的数量也在适应性的增加。这样,极有可能出现高级别的哈希表未写满的情况,进一步降低了存储空间的有效利用率。
除了哈希树结构的目录之外,还有一些文件系统(如Ntfs文件系统、Btrfs文件系统)中目录的结构为n(n≥1)阶B+树。
n阶B+树是一个n叉排序树。一个B+树包括根节点、内部节点和叶子节点。其中,根节点可能是一个叶子节点,也可能是一个包括至少两个子节点的节点。n阶B+树的所有节点包括n个关键字,每个关键字不保存数据,只用于索引,所有数据保存在叶子节点。n阶B+树的所有叶子节点包括了全部关键字的信息,以及指向这些关键字的指针信息,且叶子节点本身按照关键字的大小自小而大顺序链接。
基于上述n阶B+树,计算机在目录中查询待访问文件名时,该计算机从最小关键字开始顺序查询,或者从根节点开始随机查询,查询效率比较低。
此外,B+树的叶子节点中的关键字会在中间节点出现,这种结构导致存储空间的有效利用率较低。
基于上述查询效率低以及计算机存储空间的有效利用率低的问题,本申请实施例提供一种数据查询方法,该方法应用于包括n(n≥1)个目录块的只读文件系统中,每个目录块包括目录项区域和文件名区域,目录项区域包括至少一个目录项,文件名区域包括至少一个文件名,同一目录块中目录项的数量与文件名的数量相同,且同一目录块中的所有目录项以及所有文件名均按照预设规则顺序排列,不同目录块中的目录项的数量可能相同,也可能不同。示例性的,n个目录块中目标目录块的文件名区域包括m(m≥1)个文件名,目标目录块的目录项区域包括m个目录项,m个文件名与m个目录项一一对应,所述m个目录项和所述m个文件名均按照预设规则顺序排列。
本申请实施例中的数据查询装置在确定出待访问文件名位于目标目录块中的m个文件名所组成的文件名范围后,即数据查询装置在确定出目标目录块后,根据二分查找算法和该目标目录块,确定包括所述m个文件名中连续的x个文件名的当前第一集合以及包括所述x个文件名、第一文件名和第二文件名的当前第二集合,并确定待访问文件名与当前第二集合中的文件名之间的第一公共前缀,这样,该数据查询装置从第一公共前缀之后的首位字符起,逐字符对比待访问文件名与当前第一集合中第一预设位置的文件名(即第三文件名),当待访问文件名与第三文件名相同时,该数据查询装置根据与第三文件名对应的目录项获取待访问文件的数据。数据查询装置确定了待访问文件名与当前第二集合中的文件名之间的第一公共前缀,由于当前第二集合覆盖第一集合,因此,当前第一集合中的所有文件名与待访问文件名之间也存在第一公共前缀,这样,数据查询装置直接从第一公共前缀之后的首位字符起,逐字符对比待访问文件名与第三文件名即可,有效的提高了查询待访问文件的速率。此外,与现有的目录树结构相比,本申请实施例中的目录结构仅存储有目录项与文件名,并未存储与文件名或目录项相关的其他信息,有效的提高了存储空间的利用率。
本申请实施例中的数据查询装置可以为电脑、手机、平板电脑等终端。图4是本 申请实施例提供的一种数据查询装置的结构示意图。参见图4,该数据查询装置包括:通信接口40、处理器41和存储介质42。其中,通信接口40、处理器41和存储介质42之间通过系统总线44连接,并完成相互间通信。
通信接口40用于与其他设备通信,例如向其他设备共享某一文件的数据。
存储介质42可用于存储目录文件的数据,也可以用于存储普通文件的数据,还可以用于存储软件程序以及应用模块,处理器41通过运行存储在存储介质42的软件程序以及应用模块,从而执行数据查询装置的各种功能应用。
存储介质42包括内存42~1和外部存储器42~2。内存42~1用于暂时存储处理器41的运算数据,以及与外部存储器42~2交换的数据等。外部存储器42~2用于存储应用程序、目录文件以及普通文件的数据。本申请实施例中,目录文件由n(n≥1)个目录块组成,每个目录块包括目录项区域和文件名区域,目录项区域包括至少一个目录项,文件名区域包括至少一个文件名,同一目录块中目录项的数量与文件名的数量相同,且同一目录块中的所有目录项以及所有文件名均按照预设规则顺序排列,不同目录块中的目录项的数量可能相同,也可能不同。目录块的详细解释参考后续描述,这里不对其进行详细描述。在本申请实施方式中,所述操作系统可以为Windows操作系统,也可以是Linux操作系统。
在本申请实施例中,外部存储器42~2为非易失性存储器,例如至少一个磁盘存储器件、电子可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、闪存器件,例如反或闪存(NOR flash memory)或是反及闪存(NAND flash memory)。非易失存储器存储处理器41所执行的操作系统及应用程序。处理器41可以从非易失存储器加载运行程序与数据到内存42~1,并将数据内容存储于专门用于存储的存储设备中。
存储介质42可以是独立存在,通过系统总线44与处理器41相连接。存储介质42也可以和处理器41集成在一起。
处理器41是数据查询装置的控制中心。处理器41利用各种接口和线路连接整个数据查询装置的各个部分,通过运行或执行存储在存储介质42内的软件程序和/或应用模块,以及调用存储在存储介质42内的数据,执行存储设备的各种功能和处理数据,从而对数据查询装置整体监控。
处理器41可以仅包括中央处理器(Central Processing Unit,CPU),也可以是CPU、数字信号处理器(Digital Signal Processor,DSP)以及通信单元中的控制芯片的组合。在本申请实施方式中,CPU可以是单运算核心,也可以包括多运算核心。在具体实现中,作为一种实施例,处理器41可以包括一个或多个CPU,例如图4中的处理器41包括CPU 0和CPU 1。
系统总线44可以是将上述元件相互连接并在上述元件之间传递通信的电路,例如:该系统总线44是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component Interconnect,PCI)总线、扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线或高级微控制器总线架构(Advanced Microcontroller Bus Architecture,AMBA)等。该系统总线44可以分为地址总线、数据总线、控制总线等。本申请实施例中为了清楚说明,在图4中将各种总 线都示意为系统总线44。
需要说明的是,本申请实施例提供的数据查询方法适用于上述数据查询装置的文件系统为只读文件系统的应用场景中。
为了便于理解本申请实施例提供的数据查询方法,首先介绍本申请实施例中数据查询装置的外部存储器42~2存储的目录结构。
本申请实施例中数据查询装置的外部存储器42~2中,每个分区包括超级区、索引区和数据区。其中,超级区、索引区和数据区可以参考上述图2所示的结构的描述,这里不再进行详细赘述。
与图2所示的结构相同,本申请实施例的数据区中的目录文件也包括n个目录块。不同的是,本申请实施例中,n个目录块中的每个目录块均包括目录项区域和文件名区域,目录项区域包括多个目录项(以m个目录项为例进行说明,m≥1),文件名区域包括多个文件名(以m个文件名为例进行说明),m个目录项与m个文件名一一对应,且m个目录项和m个文件名均按照预设规则顺序排列。由于目录项与文件名一一对应,因此,目录项中记录有与其对应的文件名的存储地址。
需要说明的是,不同的目录块所包括的目录项的数量可以相同,也可以不同,本申请实施例对此不作具体限定。
本申请实施例中的目录项与现有的目录项有略微的不同。现有的目录项包括文件名、索引号和文件类型等信息。本申请实施例中的目录项包括索引号、文件类型和文件名在目录块的偏移量等信息,并不包括文件名。例如,对于索引号为S1的普通文件ABC而言,现有的与文件名“ABC”对应的目录项包括S1、ABC、普通文件类型等信息;本申请实施例中将该文件的文件名存储于文件名区域,若该文件在目录块的偏移量为32,则本申请实施例中与文件名“ABC”对应的目录项包括S1、32、普通文件类型等信息。
为了便于说明,后续内容中的目录项均表示未包括文件名的目录项。
由于文件名的长度是可变的,因此,本申请实施例中目录项和文件名独立存储的结构使得数据查询装置的存储空间得到有效利用,有效的提高了数据查询装置的存储空间的利用率。
本申请实施例中的文件名区域位于目录项区域之后,且二者相邻,这样,目录项区域中的首个目录项包括的文件名在目录块的偏移量不仅用于指示首个文件名的存储地址,还用于指示目录项区域结束。由于同一目录块中目录项的大小相同,因此,在首个目录项用于指示目录项区域结束的情况下,数据获取装置能够计算出目录项的数量。
在一个示例中,图5示出了本申请实施例中数据查询装置的存储介质42的分区结构。数据区包括n个目录块(目录块1、目录块2、……、目录块n),每个目录块均包括目录项区域和文件名区域,目录块1的目录项区域包括m个目录项(目录项1、目录项2、……、目录项m),目录块1的文件名区域包括m个文件名(文件名1、文件名2、……、文件名m),目录项1包括文件名1的存储位置,目录项2包括文件名2的存储位置,以此类推,目录项m包括文件名m的存储位置。数据获取装置根据文件名1的存储位置和目录块的起始位置,能够计算出该目录块1中目录项的数量 为m。
由于文件名的长度是变化的,因此,一个目录块包括目录项的数量是不固定的。特别的,若文件A的文件名较长,则目录块可能仅包括一个目录项和文件A的文件名。图6示出了仅包括一个目录项和一个文件名的目录块。
可选的,若本申请实施例中的某一目录块的存储空间不足以容纳该文件名,则可以将该文件名存储于该目录块的下一目录块的首个目录项中,也可以将该文件名存储于某一数据块中,并在该目录块的文件名区域中存储用于指向所述某一数据块的偏移信息。
本申请实施例中的n个目录块以及每个目录块中的所有文件名均按照预设规则顺序排列。
其中,预设规则可以为字典序顺序,也可以为字典序逆序,还可以按照其他有序排列方式,本申请实施例对此不作具体限定。
每个目录块中,目录项与文件名之间一一对应,因此,目录块中的所有目录项的排列顺序与该目录块中的所有文件名的排列顺序相同。
为了便于理解,本申请实施例后续涉及到的预设规则均以字典序顺序为例进行说明。
本申请实施例中n个目录块的存储方式可以为块间顺序存储或块间按照完全二叉树方式存储。
块间顺序存储是指n个目录块是根据n个文件名按照预设规则顺序排列存储的,这里,n个文件名中的每个文件名是指与该文件名对应的目录块中第二预设位置的文件名。
其中,目录块中第二预设位置的文件名可以为该目录块的首个文件名,也可以为该目录块的最后一个文件名,还可以为该目录块中除首个文件名和最后一个文件名之外的其他任一文件名,本申请实施例对此不作具体限定。
为了便于理解,后续均以目录块中第二预设位置的文件名为该目录块中的首个文件名或该目录块中的最后一个文件名为例进行说明。
示例性的,如图7所示,数据查询装置包括的目录块中第一个文件名有ANDY、BABY、CAFE、DASH以及EMMA,按照字典序顺序排列,这5个文件名的排列顺序依次为:ANDY、BABY、CAFE、DASH、EMMA,因此,数据查询装置中的目录块1为第一个文件名为ANDY的目录块,目录块2为第一个文件名为BABY的目录块,目录块3为第一个文件名为CAFE的目录块,目录块4为第一个文件名为DASH的目录块,目录块5为第一个文件名为EMMA的目录块。
块间按照完全二叉树方式存储是指n个目录块是按照完全二叉树方式存储的。本申请实施例中的n个目录块根据每个目录块中第四预设位置的文件名按照所述预设规则排列。
其中,目录块中第四预设位置的文件名可以为该目录块的首个文件名,也可以为该目录块的最后一个文件名,还可以为该目录块中除首个文件名和最后一个文件名之外的其他任一文件名,本申请实施例对此不作具体限定。
示例性的,如图8所示,目录块中第四预设位置的文件名为该目录块中的第一个 文件名,数据查询装置包括的目录块中第一个文件名有ANDY、BABY、CAFE、DASH以及EMMA,按照字典序顺序排列,这5个文件名的排列顺序依次为:ANDY、BABY、CAFE、DASH、EMMA。这5个文件名按照完全二叉树方式存储后,其排列顺序依次为:DASH、BABY、EMMA、ANDY、CAFE。相应的,数据查询装置中的目录块1为第一个文件名为DASH的目录块,目录块2为第一个文件名为BABY的目录块,目录块3为第一个文件名为EMMA的目录块,目录块4为第一个文件名为ANDY的目录块,目录块5为第一个文件名为CAFE的目录块。
现以n个目录块的存储方式为块间顺序存储为例说明本申请提供的数据查询方法。具体的,结合上述图4所示的数据查询装置的结构示意图、上述图5所示的数据查询装置的存储介质42的分区结构、上述图7所示的数据查询装置中目录块的一种结构示意图进行详细描述。
具体的,请参见图9,图9为本申请实施例提供的一种数据查询方法的流程示意图。
如图9所示,本申请实施例提供的数据查询方法具体包括如下步骤。
S900、数据查询装置获取文件访问请求,该文件访问请求包括待访问文件名。
可选的,数据查询装置中的某一应用程序在运行过程中触发了获取待访问文件的命令,本申请实施例中将待访问文件的名称统称为待访问文件名。这里,获取待访问文件的命令也称为文件访问请求,该文件访问请求包括待访问文件名。
示例性的,数据查询装置中的某一应用程序在运行过程中触发了命令“file*fp;fp=(“file a”)”,则数据查询装置获取到了访问文件a的请求。
S901、响应于上述文件访问请求,数据查询装置从n个目录块中确定目标目录块。
结合上述描述可知,本申请实施例中的数据查询装置包括n个目录块,每个目录块均包括目录项区域和文件名区域,目录项区域包括至少一个目录项,文件名区域包括至少一个文件名。
本申请实施例中每个目录块中的文件名按照预设规则顺序排序,因此,每个目录块的文件名均可组成一个文件名范围。
示例性的,结合上述图7,若目录块1包括3个文件名:ANDY、APPLE和ATTENT,则该目录块1中的文件名组成的文件名范围为:[ANDY,ATTENT]。
这样,数据查询装置在获取到文件访问请求后,需要首先确定出待访问文件名具体位于哪一目录块的文件名所组成的文件名范围内,进而从确定出的目录块中查找待访问文件名。
具体的,本申请实施例中的数据查询装置从n个目录块中确定目标目录块,即待访问文件名位于目标目录块的文件名所组成的文件名范围内。
容易理解的是,待访问文件名位于目标目录块的文件名所组成的文件名范围内,但待访问文件名可以不属于目标目录块包括的文件名。也就是说,目标目录块可能包括待访问文件名。
结合上一示例,目录块1包括3个文件名:ANDY、APPLE和ATTENT,该目录块1中的文件名组成的文件名范围为:[ANDY,ATTENT],待访问文件名ANGEL位于文件名范围[ANDY,ATTENT]内,但待访问文件名ANGEL不是目录块1包括 的文件名。
本申请实施例中的目标目录块的目录项区域包括m个目录项,目标目录块的文件名区域包括m个文件名,且m个目录项与m个文件名之间一一对应。目录项包括索引号、文件类型以及文件名在目录块的偏移量等信息。
具体的,数据查询装置确定目标目录块的方法为:
步骤A2:数据查询装置根据二分查找算法和n个目录块,确定当前第三集合和当前第四集合,当前第三集合包括p个文件名,该p个文件名包括p个目录块中每个目录块中第二预设位置的文件名,且p个文件名按照预设规则顺序排列,这里,p个目录块为n个目录块中连续的目录块,当前第四集合包括所述p个文件名、第四文件名以及第五文件名,第四文件名为排列于p个文件名中的首个文件名之前且与p个文件名中的首个文件名相邻的文件名,第五文件名为排列述p个文件名中的最后一个文件名之后且与p个文件名中的最后一个文件名相邻的文件名,1≤p≤n;
步骤B2:确定待访问文件名与当前第四集合中的文件名之间的第二公共前缀;
步骤C2:从第二公共前缀之后的首位字符起,逐字符对比待访问文件名与第六文件名;其中,第六文件名为当前第三集合中第三预设位置的文件名;
若待访问文件名与第六文件名相同,则数据查询装置确定第六文件名归属的目录块为目标目录块。
从上述集合的定义可以看出,当前第四集合的文件名组成的文件名范围大于当前第三集合的文件名组成的文件名范围,且当前第四集合包括第四文件名以及第五文件名。特殊的,在p=n的情况下,第四文件名与第五文件名为空。
由于当前第四集合的文件名组成的文件名范围大于当前第三集合的文件名组成的文件名范围,因此,当前第三集合中的文件名也存在待访问文件名与当前第四集合中的文件名之间的第二公共前缀,这样的话,数据查询装置在确定出第二公共前缀后,可从第二公共前缀后的首位字符起,逐字符对比待访问文件名与第六文件名,提高了查询的效率。
容易理解的是,在第四文件名与第五文件名为空的情况下,第二公共前缀初始化为空。
具体的,数据查询装置确定待访问文件名与当前第四集合中的文件名之间的第二公共前缀的方法为:数据查询装置确定待访问文件名与第四文件名之间共有的第三前缀,并确定待访问文件名与第五文件名之间共有的第四前缀;然后,该数据查询装置将第三前缀与第四前缀中长度最小的一个确定为第二公共前缀。
在一个示例中,结合上述图7,令当前第四集合为{ANDY、BABY、CAFE、DASH、EMMA},当前第三集合为{BABY、CAFE、DASH},第四文件名为ANDY,第五文件名为EMMA,待访问文件名为CORE,则数据查询装置确定第三前缀为空,第四前缀为空,则该数据查询装置确定第二公共前缀为空。
在另一个示例中,令当前第四集合为{A、AC、ACB、ACD、AD、B、C},当前第三集合为{AC、ACB、ACD、AD、B},若待访问文件名为ACC,第四文件名为A,第五文件名为C,数据查询装置确定第三前缀为A,第四前缀为空,则该数据查询装置确定第二公共前缀为空。
第六文件名为当前第三集合中第三预设位置的文件名,该第三预设位置的文件名可以为当前第三集合中的第
Figure PCTCN2018075300-appb-000009
个文件名,也可以为当前第三集合中其他位置的文件名,本申请实施例对此不作具体限定。
数据查询装置逐字符对比待访问文件名与第六文件名后,可确定出待访问文件名与第六文件名是否相同。
若待访问文件名与第六文件名相同,则数据查询装置确定第六文件名归属的目录块为目标目录块。可选的,若待访问文件名与第六文件名相同,数据查询装置可以无需再确定目标目录块,该数据查询装置可直接根据第六文件名确定出与待访问文件名对应的目录项,进而根据与待访问文件名对应的目录项,获取到待访问文件的数据。
由于数据查询装置在确定目标目录块的过程中直接找到与待访问文件名相同的文件名的过程较为简单,本申请实施例对这一情况不作详细描述。
若待访问文件名与第六文件名不同,当2≤p≤n时,数据查询装置根据当前第三集合、当前第四集合和二分查找算法,重新确定第三集合和第四集合,并根据重新确定的第三集合和重新确定的第四集合,执行上述步骤B2和上述步骤C2。
结合前面对二分查找的描述可知,重新确定的第三集合包括当前第三集合中排列于第六文件名之前或之后的所有文件名。
具体的,若待访问文件名的特征值大于第六文件名的特征值,则重新确定的第三集合包括当前第三集合中排列于第六文件名之后的所有文件名。若待访问文件名的特征值小于第六文件名的特征值,则重新确定的第三集合包括当前第三集合中排列于第六文件名之前的所有文件名。
示例性的,结合上述图7,令当前第三集合包括的文件名为图7中的ANDY、BABY、CAFE、DASH和EMMA,第六文件名为CAFE。若待访问文件名为ANDY,则重新确定的第三集合包括的文件名为ANDY和BABY。若待访问文件名为DASH,则重新确定的第三集合包括的文件名为DASH和EMMA。
数据查询装置根据重新确定的第四集合执行上述步骤B2,重新确定第二公共前缀。
在一个示例中,令当前第四集合为{A、AC、ACB、ACD、AD、B、C},当前第三集合为{AC、ACB、ACD、AD、B},若待访问文件名为ACC,第六文件名为ACD,第四文件名为A,第五文件名为C,数据查询装置确定当前的第二公共前缀为空(当前的第三前缀为A,当前的第四前缀为空,因此,当前的第二公共前缀为空),并根据该该当前的第二公共前缀从首位字符起逐字符对比待访问文件名ACC与文件名ACD。由于待访问文件名ACC的特征值小于文件名ACD的特征值,因此,该数据查询装置重新确定第三集合和第四集合,重新确定的第三集合为{AC、ACB},重新确定的第四集合为{A、AC、ACB、ACD}。数据查询装置根据重新确定的第四集合重新确定第二公共前缀。重新确定的第四集合中的第四文件名为A,重新确定的第四集合中的第五文件名为ACD,因此,在重新确定第二公共前缀的过程中,第三前缀为A,第四前缀为AC,数据查询装置将A与AC中长度最小的一个确定为第二公共前缀,即数据查询装置重新确定的第二公共前缀为A。
若待访问文件名与第六文件名不同,当p=1时,数据查询装置根据当前第三集合包括的文件名确定目标目录块。具体的,在第二预设位置的文件名为目录块的首个文 件名的情况下,若待访问文件名的特征值大于当前第三集合中的文件名的特征值,则数据查询装置将当前第三集合中的文件名归属的目录块确定为目标目录块;若待访问文件名的特征值小于当前第三集合中的文件名的特征值,则数据查询装置将位于当前第三集合中的文件名之前且与当前第三集合中的文件名相邻的文件名归属的目录块确定为目标目录块。
在一个示例中,每一文件名为对应目录块中的首个文件名,若当前第三集合包括A、B、C、D、E、F、G、H以及I这九个文件名,第六文件名为E,待访问文件名为EA,则数据查询装置确定目标目录块的过程为:①、第二公共前缀为空,数据查询装置根据第二公共前缀从首位字符起逐字符对比待访问文件名EA与第六文件名E;由于待访问文件名EA的特征值大于文件名E的特征值,该数据查询装置重新确定第三集合为{F、G、H、I};②、步骤①中,数据查询装置重新确定的第三集合为{F、G、H、I},因此,这一步骤中,当前第三集合为{F、G、H、I},若第六文件名为当前第三集合中的第
Figure PCTCN2018075300-appb-000010
个文件名,则第六文件名为G,这一步骤中第二公共前缀为空,数据查询装置根据第二公共前缀从首位字符起逐字符对比待访问文件名EA与文件名G;由于待访问文件名EA的特征值小于文件名G的特征值,该数据查询装置重新确定第三集合为{F},重新确定的第四集合为{E、F、G};③、步骤②中,数据查询装置重新确定的第三集合为{F},重新确定的第四集合为{E、F、G},因此,这一步骤中,当前第三集合为{F},当前第四集合为{E、F、G},数据查询装置根据{E、F、G}确定的第二公共前缀为空;数据查询装置根据当前的第二公共前缀从首位字符起逐字符对比待访问文件名EA与文件名F;由于当前第三集合仅包括一个文件名F,且待访问文件名EA的特征值小于F的特征值,因此,目标目录块为位于F之前且与F相邻的文件名E归属的目录块。
在另一个示例中,结合上述图7,如图10所示,若当前第三集合为{ANDY、BABY、CAFE、DASH、EMMA},待访问文件名为CORE,数据查询装置确定目标目录块的过程为:①、此时,第四文件名与第五文件名为空,第二公共前缀为空,若第六文件名为CAFE,数据查询装置根据该第三公共前缀从首字符逐字符对比待访问文件名CORE与文件名CAFE;由于待访问文件名CORE的特征值大于文件名CAFE的特征值,数据查询装置重新确定第三集合为{DASH、EMMA},并重新确定第二公共前缀为空(待访问文件名CORE与文件名CAFE之间共有的第三前缀为“C”;待访问文件名CORE与第五文件名之间共有的第四前缀为空;因此,重新确定的第二公共前缀为空);②、步骤①中数据查询装置重新确定的第三集合为{DASH、EMMA},因此在这一步骤中,当前第三集合为{DASH、EMMA},若第六文件名为EMMA,数据查询装置根据当前的第二公共前缀从首字符逐字符对比待访问文件名CORE与文件名EMMA;③、由于待访问文件名CORE的特征值小于文件名EMMA的特征值,数据查询装置重新确定第三集合为{DASH},并重新确定第二公共前缀为空;此时,当前第三集合为{DASH},当前的第二公共前缀为空,数据查询装置根据当前的第二公共前缀从首位字符起逐字符对比待访问文件名CORE与文件名DASH;由于当前第三集合仅包括一个文件名DASH,且待访问文件名CORE的特征值小于文件名DASH的特征值,因此,数据查询装置确定目标目录块为位于DASH之前且与DASH相邻 的文件名CAFE归属的目录块;④、数据查询装置确定目标目录块为目录块3。
在一个示例中,每一文件名为对应目录块中的首个文件名,若当前第三集合为{A、AC、ACB、ACD、AD、B、C},第四文件名与第五文件名为空,第六文件名为ACD,待访问文件名为ACC,则数据查询装置确定目标目录块的过程为:①、由于第四文件名与第五文件名为空,因此当前的第二公共前缀为空,数据查询装置根据该第二公共前缀从首位字符起逐字符对比待访问文件名ACC与第六文件名ACD;由于待访问文件名ACC的特征值小于文件名ACD的特征值,该数据查询装置重新确定第三集合为{A、AC、ACB},重新确定第四集合为{空、A、AC、ACB、ACD};②、在步骤①中,数据查询装置重新确定的第三集合为{A、AC、ACB},因此,步骤②中当前第三集合为{A、AC、ACB},若第六文件名为当前第三集合中的第
Figure PCTCN2018075300-appb-000011
个文件名,则第六文件名为AC;步骤①中重新确定第四集合为{空、A、AC、ACB、ACD},因此,步骤②中当前第四集合为{空、A、AC、ACB、ACD},当前第四集合中的第四文件名为空,当前第四集合中的第五文件名为ACD,相应的,当前的第三前缀为空,当前的第四前缀为AC,因此,数据查询装置重新确定的第二公共前缀为空;数据查询装置根据该第二公共前缀从首位字符起逐字符对比待访问文件名ACC与文件名AC;由于待访问文件名ACC的特征值大于文件名AC的特征值,该数据查询装置重新确定第三集合为{ACB},重新确定的第四集合为{AC、ACB、ACD};③、在步骤②中,数据查询装置重新确定的第三集合为{ACB},重新确定的第四集合为{AC、ACB、ACD},因此,在步骤③中,当前第三集合为{ACB},当前第四集合为{AC、ACB、ACD};当前第四集合中的第四文件名为AC,因此,第三前缀为“AC”;当前第四集合中的第五文件名为ACD,因此,第四前缀为“AC”;数据查询装置将第三前缀“AC”与第四前缀“AC”中长度最小的一个作为第二公共前缀,因此,当前的第二公共前缀为“AC”;数据查询装置该第二公共前缀从“AC”之后的首位字符起逐字符对比待访问文件名ACC与文件名ACB;由于当前第三集合仅包括一个文件名ACB,且待访问文件名ACC的特征值大于ACB的特征值,因此,目标目录块为ACB归属的目录块。
特殊的,在第二预设位置的文件名为目录块中首个文件名,第三预设位置的文件名为当前第三集合中的首个文件名,即第六文件名为当前第三集合中的首个文件名的情况下,若待访问文件名的特征值小于第六文件名的特征值,无论p的数值是否大于1,数据查询装置均将位于第六文件名之前且与第六文件名相邻的文件名归属的目录块确定为目标目录块。
在一个示例中,结合上述图7,若当前第三集合为{ANDY、BABY、CAFE、DASH、EMMA},待访问文件名为CORE,数据查询装置确定目标目录块的过程为:①、此时,第四文件名与第五文件名为空,第二公共前缀为空,若第六文件名为CAFE,数据查询装置根据该第三公共前缀从首字符逐字符对比待访问文件名CORE与文件名CAFE;由于待访问文件名CORE的特征值大于文件名CAFE的特征值,数据查询装置重新确定第三集合为{DASH、EMMA},并重新确定第二公共前缀为空(待访问文件名CORE与文件名CAFE之间共有的第三前缀为“C”;待访问文件名CORE与第五文件名之间共有的第四前缀为空;因此,重新确定的第二公共前缀为空);②、步骤①中数据查询装置重新确定的第三集合为{DASH、EMMA},因此在这一步骤中,当前第三集 合为{DASH、EMMA},若第六文件名为DASH,数据查询装置根据当前的第二公共前缀从首字符逐字符对比待访问文件名CORE与文件名DASH;③、由于待访问文件名CORE的特征值小于文件名DASH的特征值,数据查询装置确定目标目录块为位于DASH之前且与DASH相邻的文件名CAFE归属的目录块,即确定目标目录块为目录块3。
在第二预设位置的文件名为目录块中首个文件名,第三预设位置的文件名为当前第三集合中的最后一个文件名,即第六文件名为当前第三集合中的最后一个文件名的情况下,若待访问文件名的特征值大于第六文件名的特征值,数据查询装置均将该第六文件名归属的目录块确定为目标目录块。
在一个示例中,结合上述图7,若当前第三集合为{ANDY、BABY、CAFE、DASH、EMMA},待访问文件名为END,第六文件名为当前第三集合中的最后一个文件名EMMA,由于待访问文件名END的特征值大于文件名EMMA的特征值,因此,数据查询装置确定文件名EMMA归属的目录块为目标目录块。
可选的,在n个目录块的存储方式为块间顺序存储的场景中,若上述第六文件名为当前第三集合中的第
Figure PCTCN2018075300-appb-000012
个文件名,对于每个目录块,若第二预设位置的文件名为该目录块中首个文件名,本申请实施例中的数据查询装置确定目标目录块的伪代码可以为如下代码:
Figure PCTCN2018075300-appb-000013
Figure PCTCN2018075300-appb-000014
该代码中的headprefix为第三前缀,endprefix为第四前缀,上述查询闭区间等效于上述第三集合。
可以看出,在确定目标目录块之前,数据查询装置的查询闭区间在不断缩小,且在每一查询闭区间中,数据查询装置均是从公共前缀后的首个字符起,逐字符对比待访问文件名fname与位于该查询闭区间中间位置的文件名dirent0name[mid],有效的提高了查询目标目录块的速率。
数据查询装置在S901中确定出目标目录块后,从该目标目录块中查找与待访问文件名相同的文件名。本申请实施例中将与待访问文件名对应的目录项称为目标目录项。数据查询装置在查找到与待访问文件名相同的文件名后,可确定出目标目录项,进而根据目标目录项获取待访问文件的数据。
具体的,数据查询装置在执行S901后,执行S902,并顺序执行后续步骤。
S902、数据查询装置根据二分查找算法和目标目录块,确定当前第一集合和当前第二集合。
这里,当前第一集合包括m个文件名中连续的x个文件名,当前第二集合包括所述x个文件名、第一文件名以及第二文件名,m≥x≥1。第一文件名为排列于x个文件名中的首个文件名之前且与x个文件名中的首个文件名相邻的文件名,第二文件名为排列于x个文件名中的最后一个文件名之后且与x个文件名中的最后一个文件名相邻的文件名。
目标目录块包括m个文件名,数据查询装置根据二分查找算法在目标目录块中的m个文件名中进行查询。
具体的,数据查询装置根据二分查找算法和目标目录块,确定包括x个文件名的当前第一集合以及包括所述x个文件名、第一文件名以及第二文件名的当前第二集合。
容易理解的是,从上述对二分查找的描述以及图1示出的流程可知,在数据查询装置的查询过程,查询区间不断缩小,且在每一查询区间内数据查询装置均需比较待访问文件名的特征值与该查询区间中间位置的文件名的特征值的大小。为了便于描述,本申请实施例以当前的查询区间为当前第一集合为例进行说明。
特殊的,若x=m,则第一文件名与第二文件名为空。
S903、数据查询装置确定待访问文件名与当前第二集合中的文件名之间的第一公共前缀。
从上述当前第一集合和当前第二集合的描述可知,当前第二集合中所有文件名组成的文件名范围大于当前第一集合中所有文件名组成的文件名范围,因此,当前第一 集合中的所有文件名均存在第一公共前缀。
容易理解的是,第一文件名与第二文件名为空的情况下,第一公共前缀初始化为空。
具体的,数据查询装置确定第一公共前缀的方法为:数据查询装置确定待访问文件名与第一文件名之间共有的第一前缀,并确定待访问文件名与第二文件名之间共有的第二前缀;然后,该数据查询装置将第一前缀与第二前缀中长度最小的一个确定为第一公共前缀。
在一个示例中,如图11所示,目标目录块包括7个顺序排列的文件名:CAFE、CAGE、CAK、CELL、CORN、DAB、DACE。若当前第一集合为{CAFE、CAGE、CAK、CELL、CORN、DAB、DACE},第一文件名和第二文件名均为空,第三文件名为CELL,待访问文件名为CAK,由于第一文件名和第二文件名均为空,因此,第一前缀和第二前缀均为空,相应的,当前的第一公共前缀为空。
若当前第一集合为{CAFE、CAGE、CAK},则第一文件名为空,第二文件名为CELL,第一文件名与待访问文件名CAK之间共有的第一前缀为空,待访问文件名CAK与第二文件名CELL之间共有的第二前缀为“C”,第一前缀的长度小于第二前缀,因此,当前的第一公共前缀为空。
S904、数据查询装置从第一公共前缀之后的首位字符起,逐字符对比待访问文件名与第三文件名。
其中,第三文件名为当前第一集合中第一预设位置的文件名。
可选的,第一预设位置的文件名可以为当前第一集合中的第
Figure PCTCN2018075300-appb-000015
个文件名,也可以为当前第一集合中其他位置的文件名,本申请实施例对此不作具体限定。
由于当前第一集合中的所有文件名均存在第一公共前缀,因此,数据查询装置在确定出第一公共前缀之后,从第一公共前缀之后的首位字符起,逐字符对比待访问文件名与第三文件名,提高了数据查询装置查找与待访问文件名相同的文件名的速率。
数据查询装置逐字符对比待访问文件名与第三文件名后,可确定出待访问文件名与第三文件名是否相同。
若待访问文件名与第三文件名相同,则说明第三文件名所对应的文件为待访问文件,此时,目标目录项为与第三文件名对应的目录项。数据查询装置根据目标目录项可获取到待访问文件的数据。在这种情况下,数据查询装置在执行S904后,继续执行下述步骤S905。
若待访问文件名与第三文件名不相同,数据查询装置需确定待访问文件名的特征值与第三文件名的特征值之间的大小关系,进而根据该大小关系重新确定第一集合和第二集合,并根据重新确定的第一集合和重新确定的第二集合,执行上述步骤B1和上述步骤C1,直到获取到待访问文件的数据或确定目标目录块未包括待访问文件名。
具体的,若待访问文件名的特征值小于第三文件名的特征值,数据查询装置重新确定的第一集合包括当前第一集合中位于第三文件名之前的所有文件名。若待访问文件名的特征值大于第三文件名的特征值,数据查询装置重新确定的第一集合包括当前第一集合中位于第三文件名之后的所有文件名。
示例性的,结合上述图11,令第一集合为{CAFE、CAGE、CAK、CELL、CORN、 DAB、DACE},第三文件名为CELL。若待访问文件名为DACE,则待访问文件名DACE的特征值大于第三文件名CELL的特征值,重新确定的第一集合为{CORN、DAB、DACE}。若待访问文件名为CAGE,则待访问文件名CAGE的特征值小于第三文件名CELL的特征值,重新确定的第一集合为{CAFE、CAGE、CAK}。
相应的,若待访问文件名的特征值小于第三文件名的特征值,数据查询装置重新确定的重新确定的第二集合包括第一文件名、当前第一集合中位于第三文件名之前的所有文件名以及第三文件名。
若待访问文件名的特征值大于第三文件名的特征值,数据查询装置重新确定的第二集合包括第三文件名、当前第一集合中位于第三文件名之后的所有文件名以及第二文件名。
S905、数据查询装置根据目标目录项获取待访问文件的数据。
具体的,数据查询装置从目标目录项中获取待访问文件的索引号,并根据待访问文件的索引号获取待访问文件的索引,进而,该数据查询装置根据待访问文件的索引即可获取到待访问文件的数据。
现结合具体示例说明数据查询装置在目标目录块中查找与待访问文件名相同的文件名的过程。
在一个示例中,结合图11,如图12所示,当前第一集合为{CAFE、CAGE、CAK、CELL、CORN、DAB、DACE},第三文件名为CELL,第一文件名和第二文件名均为空,待访问文件名为CAK。数据查询装置查询待访问文件名的过程为:①、此时,第一公共前缀初始化为空,数据查询装置根据该第一公共前缀逐字符对比待访问文件名CAK与文件名CELL;由于待访问文件名CAK的特征值小于文件名CELL的特征值,数据查询装置重新确定第一集合为{CAFE、CAGE、CAK},重新确定第二集合中第一文件名为空,第二文件名为CELL,因此,重新确定的第一公共前缀为空(待访问文件名CAK与文件名CELL之间共有的第二前缀为“C”,待访问文件名CAK与第一文件名之间共有的第一前缀为空,因此,重新确定的第一公共前缀为空);②、在步骤①中重新确定的第一集合为{CAFE、CAGE、CAK},因此,在步骤②中的当前第一集合为{CAFE、CAGE、CAK},相应的,当前的第一公共前缀为空,若第三文件名为CAGE,数据查询装置根据当前的第一公共前缀逐字符对比待访问文件名CAK与文件名CAGE,由于待访问文件名CAK的特征值大于文件名CAGE的特征值,数据查询装置重新确定第一集合为{CAK},重新确定第二集合为{CAGE、CAK、CELL};③、在步骤②中重新确定的第一集合为{CAK},重新确定第二集合为{CAGE、CAK、CELL},因此,在步骤③中的当前第一集合为{CAK},当前第二集合为{CAGE、CAK、CELL},数据查询装置确定当前的第一公共前缀为“C”,数据查询装置从“C”后的首个字符开始逐字符对比待访问文件名CAK与当前第一集合中的文件名CAK。由于当前第一集合中的文件名CAK与待访问文件名CAK相同,因此,数据查询装置确定目标目录项为与CAK对应的目录项。进而,数据查询装置根据目标目录项获取待访问文件的数据。
可选的,本申请实施例中数据查询装置在目标目录块中查询待访问文件名的伪代码可以为如下代码:
Figure PCTCN2018075300-appb-000016
该代码中的headprefix为第一前缀,endprefix为第二前缀,fname为待访问文件名,上述查询闭区间等效于上述第一集合。
可以看出,在找到与待访问文件名相同的文件名之前,数据查询装置的查询区间在不断缩小,且在每一查询区间中,数据查询装置均是从公共前缀后的首个字符起,逐字符对比待访问文件名fname与位于该查询区间中间位置的文件名direntname[mid],有效的提高了查询速率。
当本申请实施例中n个目录块的存储方式为块间按照完全二叉树方式存储时,本申请实施例提供的数据查询方法的流程依旧为图9示出的流程,数据查询装置也可执行S900~S905。但是,在n个目录块的存储方式为块间按照完全二叉树方式存储的场景和n个目录块的存储方式为块间顺序存储的场景中,数据查询装置从n个目录块中确定目标目录块的方法不同。
现在对在n个目录块的存储方式为块间按照完全二叉树方式存储的场景中,数据查询装置从n个目录块中确定目标目录块的方法进行解释。
具体的,在n个目录块的存储方式为块间按照完全二叉树方式存储的场景中,数据查询装置确定目标目录块的方法为:
步骤A3:数据查询装置确定当前候选目录块和当前第三公共前缀;
步骤B3:数据查询装置从当前第三公共前缀之后的首位字符起,逐字符对比待访问文件名与第i个文件名,该第i个文件名为n个目录块中第i个目录块中第四预设位置的文件名,0≤i<n;
若待访问文件名与第i个文件名相同,则数据查询装置重新确定候选目录块为第i个文件名归属的目录块,并将重新确定的候选目录块确定为目标目录块。
第i个文件名为n个目录块中第i个目录块中第四预设位置的文件名。这里,第四预设位置的文件名可以为对应目录块中的首个文件名,也可以为对应目录块中的最后一个文件名,还可以为对应目录块中的其他文件名,本申请实施例对此不作具体限定。
数据查询装置从当前第三公共前缀之后的首位字符起,逐字符对比待访问文件名与第i个文件名之后,可确定出待访问文件名与第i个文件名是否相同。
若待访问文件名与第i个文件名相同,则说明第i个文件名归属的目录块为目标目录块。此时,数据查询装置可直接根据第i个文件名确定出与待访问文件名对应的目录项,进而根据与待访问文件名对应的目录项,获取到待访问文件的数据。
若待访问文件名与第i个文件名不同,则数据查询装置重新确定第三公共前缀、候选目录块以及第i个文件名,并根据重新确定的第三公共前缀、重新确定的候选目录块以及重新确定的第i个文件名,重新执行上述步骤B3,直到确定出目标目录块。
上述重新确定的第i个文件名为n个目录块中第j个目录块中第四预设位置的文件名。其中,若待访问文件名的特征值大于第i个文件名的特征值,j=2i+2;若待访问文件名的特征值小于第i个文件名的特征值,j=2i+1,0≤i<j<n。
与在n个目录块的存储方式为块间顺序存储的场景中数据查询装置确定公共前缀的方法类似,在n个目录块的存储方式为块间按照完全二叉树方式存储的场景中,数据查询装置也确定公共前缀。
在n个目录块的存储方式为块间按照完全二叉树方式存储的场景中,数据查询装置确定第三公共前缀。
若第四预设位置的文件名为对应目录块中的首个文件名,则数据查询装置重新确定第三公共前缀的方法为:当待访问文件名的特征值大于第i个文件名的特征值时,数据查询装置将当前第一目标前缀更新为待访问文件名与第i个文件名之间共有的前缀,并将更新后的第一目标前缀与当前第二目标前缀中长度最小的一个确定为重新确定的第三公共前缀。当待访问文件名的特征值小于第i个文件名的特征值时,数据查 询装置将当前第二目标前缀更新为待访问文件名与第i个文件名之间共有的前缀,并将当前第一目标前缀与更新后的第二目标前缀中长度最小的一个确定为重新确定的第三公共前缀。
其中,第一目标前缀的长度和第二目标前缀的长度的初始值均为零,且第一目标前缀的长度和第二目标前缀的长度随着待访问文件名的特征值与第i个文件名的特征值的大小关系发生变化。
此外,在第四预设位置的文件名为对应目录块中的首个文件名的情况下,数据查询装置重新确定候选目录块的方法为:若待访问文件名的特征值大于第i个文件名的特征值,则数据查询装置确定候选目录块为第i个文件名归属的目录块;若待访问文件名的特征值小于第i个文件名的特征值,则数据查询装置确定候选目录块与当前候选目录块相同。
容易理解的是,若i=0,当前候选目录块为空,当前第三公共前缀为空。
现结合具体示例说明在n个目录块的存储方式为块间按照完全二叉树方式存储的场景中,数据查询装置在目标目录块中查找与待访问文件名相同的文件名的过程。
示例性的,结合上述图8,如图13所示,若待访问文件名为CORE,数据查询装置确定目标目录块的过程为:①、候选目录块的初始值为空,第三公共前缀初始值为空,数据查询装置从首位字符起逐字符对比待访问文件名CORE与第0个文件名DASH,由于待访问文件名CORE的特征值小于DASH的特征值,数据查询装置需要进一步比较待访问文件名CORE与第1(2*0+1)个文件名BABY。数据查询装置重新确定的第三公共前缀依旧为空,此时,候选目录块保持不变,依旧为空;②、数据查询装置从首位字符起逐字符对比待访问文件名CORE与第1个文件名BABY,由于待访问文件名CORE的特征值大于BABY的特征值,数据查询装置将候选目录块变更为BABY归属的目录块(即目录块2),此外,该查询装置需要进一步比较待访问文件名CORE与第4(2*1+2)个文件名CAFE。数据查询装置重新确定的第三公共前缀依旧为空。③、数据查询装置从首位字符起逐字符对比待访问文件名CORE与第4个文件名CAFE。由于待访问文件名CORE的特征值大于CAFE的特征值,数据查询装置将候选目录块变更为CAFE归属的目录块(即目录块5);此外,数据查询装置需要进一步比较待访问文件名与第10(2*4+2)个文件名;但是,第10个文件名不存在;因此,数据查询装置确定CAFE归属的目录块为目标目录块,即确定目标目录块为目录块5。
可选的,在n个目录块的存储方式为块间按照完全二叉树方式存储的场景中,对于每个目录块,若第四预设位置的文件名为该目录块中首个文件名,本申请实施例中的数据查询装置确定目标目录块的伪代码可以为如下代码:
Figure PCTCN2018075300-appb-000017
Figure PCTCN2018075300-appb-000018
该代码中的headprefix相当于第一目标前缀,endprefix相当于第二目标前缀。
可以看出,数据查询装置均是从公共前缀后的首个字符起,逐字符对比待访问文件名fname与第i个文件名dirent0name[i],有效的提高了查询目标目录块的速率。
综上所述,无论n个目录块的存储方式是块间顺序存储,还是块间按照完全二叉树方式存储,本申请实施例提供的数据查找方法均可有效的提高查询待访问文件的速率。
本申请实施例提供一种数据查询装置,该数据查询装置用于执行以上数据查询方法中的数据查询装置所执行的步骤。本申请实施例提供的数据查询装置可以包括相应步骤所对应的模块。
本申请实施例可以根据上述方法示例对数据查询装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图14示出上述实施例中所 涉及的数据查询装置的一种可能的结构示意图。如图14所示,数据查询装置140包括处理单元1400和获取单元1401。
处理单元1400用于支持该数据查询装置10执行上述实施例中的S901、S902、S903、S904等,和/或用于本文所描述的技术的其它过程。
获取单元1401用于支持该数据查询装置10执行上述实施例中的S900、S905等,和/或用于本文所描述的技术的其它过程。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
当然,本申请实施例提供的数据查询装置140包括但不限于上述模块,例如:数据查询装置还可以包括存储单元1402。
存储单元1402可以用于存储该数据查询装置140的程序代码和数据。
在采用集成的单元的情况下,本申请实施例提供的数据查询装置的结构示意图如图15所示。在图15中,数据查询装置150包括:处理模块1500和通信模块1501。处理模块1500用于对数据查询装置150的动作进行控制管理,例如,执行上述处理单元1400执行的步骤,和/或用于执行本文所描述的技术的其它过程。通信模块1501用于支持数据查询装置150与其他设备之间的交互,例如,执行上述获取单元1401执行的步骤。如图15所示,数据查询装置150还可以包括存储模块1502,存储模块1502用于存储数据查询装置150的程序代码和数据,例如存储上述存储单元1402所保存的内容。
其中,处理模块1500可以是处理器或控制器,例如可以是CPU,通用处理器,DSP,专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块1501可以是收发器、RF电路或通信接口等。存储模块1502可以是存储器。
结合图4,处理模块1500可以为图4中的处理器41,通信模块1501可以为图4中的通信接口40,存储模块1502可以为图2中的存储介质42。
其中,上述方法实施例涉及的各场景的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
上述数据查询装置140和数据查询装置150均可执行上述图9所示的数据查询方法,数据查询装置140和数据查询装置150具体可以是终端。
本申请还提供一种终端,该终端包括:一个或多个处理器、存储器、通信接口。该存储器、通信接口与一个或多个处理器耦合;存储器用于存储计算机程序代码,计算机程序代码包括指令,当一个或多个处理器执行指令时,终端执行本申请实施例的数据查询方法。
这里的终端可以是视频显示设备,智能手机,便携式电脑以及其它可以处理视频或者播放视频的设备。
本申请另一实施例还提供一种计算机可读存储介质,该计算机可读存储介质包括一个或多个程序代码,该一个或多个程序包括指令,当终端中的处理器在执行该程序代码时,该终端执行如图9所示的数据查询方法。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;终端的至少一个处理器可以从计算机可读存储介质读取该计算机执行指令,至少一个处理器执行该计算机执行指令使得终端实施执行图9所示的数据查询方法中的数据查询装置的步骤。
在上述实施例中,可以全部或部分的通过软件,硬件,固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式出现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。
所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘,硬盘、磁带)、光介质(例如,DVD)或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实 现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (33)

  1. 一种数据查询方法,其特征在于,应用于包括n个目录块的只读文件系统中,每个目录块包括目录项区域和文件名区域,n≥1,所述数据查询方法包括:
    从所述n个目录块中确定目标目录块,所述目标目录块的目录项区域包括m个目录项,所述目标目录块的文件名区域包括m个文件名,所述m个目录项与所述m个文件名一一对应,所述m个目录项和所述m个文件名均按照预设规则顺序排列,所述待访问文件名位于文件名范围中,所述文件名范围是由所述目标目录块的首个文件名与所述目标目录块的最后一个文件名组成的范围,m≥1;
    步骤A1:根据二分查找算法和目标目录块,确定当前第一集合和当前第二集合,所述当前第一集合包括所述m个文件名中连续的x个文件名,所述当前第二集合包括所述x个文件名、第一文件名以及第二文件名,所述第一文件名为排列于所述x个文件名中的首个文件名之前且与所述x个文件名中的首个文件名相邻的文件名,所述第二文件名为排列于所述x个文件名中的最后一个文件名之后且与所述x个文件名中的最后一个文件名相邻的文件名,m≥x≥1;
    步骤B1:确定所述待访问文件名与所述当前第二集合中的文件名之间的第一公共前缀;
    步骤C1:从所述第一公共前缀之后的首位字符起,逐字符对比所述待访问文件名与第三文件名;其中,所述第三文件名为所述当前第一集合中第一预设位置的文件名;
    若所述待访问文件名与所述第三文件名相同,则根据与所述第三文件名对应的目录项获取待访问文件的数据。
  2. 根据权利要求1所述的数据查询方法,其特征在于,所述数据查询方法还包括:
    若所述待访问文件名与所述第三文件名不同,则根据所述二分查找算法、所述当前第一集合和所述当前第二集合,重新确定第一集合和第二集合,并根据重新确定的第一集合和重新确定的第二集合,执行所述步骤B1和所述步骤C1,直到获取到所述待访问文件的数据或确定所述目标目录块未包括所述待访问文件名。
  3. 根据权利要求2所述的数据查询方法,其特征在于,所述预设规则为字典序顺序;
    所述若所述待访问文件名与所述第三文件名不同,则根据所述二分查找算法、所述当前第一集合和所述当前第二集合,重新确定第一集合和第二集合,具体包括:
    若所述待访问文件名的特征值小于所述第三文件名的特征值,确定所述重新确定的第一集合包括所述当前第一集合中位于所述第三文件名之前的所有文件名,所述重新确定的第二集合包括所述第一文件名、所述当前第一集合中位于所述第三文件名之前的所有文件名以及所述第三文件名;
    或者,
    若所述待访问文件名的特征值大于所述第三文件名的特征值,确定所述重新确定的第一集合包括所述当前第一集合中位于所述第三文件名之后的所有文件名,所述重新确定的第二集合包括所述第三文件名、所述当前第一集合中位于所述第三文件名之后的所有文件名以及所述第二文件名。
  4. 根据权利要求1-3中任意一项所述的数据查询方法,其特征在于,所述确定所 述待访问文件名与所述当前第二集合中的文件名之间的第一公共前缀,具体包括:
    确定所述待访问文件名与所述第一文件名之间共有的第一前缀;
    确定所述待访问文件名与所述第二文件名之间共有的第二前缀;
    将所述第一前缀与所述第二前缀中长度最小的一个确定为所述第一公共前缀。
  5. 根据权利要求1-4中任意一项所述的数据查询方法,其特征在于,所述从所述n个目录块中确定目标目录块,具体包括:
    步骤A2:根据所述二分查找算法和所述n个目录块,确定当前第三集合和当前第四集合;其中,所述当前第三集合包括p个文件名,所述p个文件名包括p个目录块中每个目录块中第二预设位置的文件名,且所述当前第三集合中的文件名按照所述预设规则顺序排列,所述p个目录块为所述n个目录块中连续的目录块,所述当前第四集合包括所述p个文件名、第四文件名以及第五文件名,所述第四文件名为排列于所述p个文件名中的首个文件名之前且与所述p个文件名中的首个文件名相邻的文件名,所述第五文件名为排列于所述p个文件名中的最后一个文件名之后且与所述p个文件名中的最后一个文件名相邻的文件名,1≤p≤n;
    步骤B2:确定所述待访问文件名与所述当前第四集合中的文件名之间的第二公共前缀;
    步骤C2:从所述第二公共前缀之后的首位字符起,逐字符对比所述待访问文件名与第六文件名;其中,所述第六文件名为所述当前第三集合中第三预设位置的文件名;
    若所述待访问文件名与所述第六文件名相同,则确定所述第六文件名归属的目录块为所述目标目录块。
  6. 根据权利要求5所述的数据查询方法,其特征在于,若所述待访问文件名与所述第六文件名不同,所述数据查询方法还包括:
    当2≤p≤n时,根据所述当前第三集合、所述当前第四集合和所述二分查找算法,重新确定第三集合和第四集合,并根据重新确定的第三集合和重新确定的第四集合,执行所述步骤B2和所述步骤C2;
    当p=1时,根据所述当前第三集合包括的文件名确定所述目标目录块。
  7. 根据权利要求6所述的数据查询方法,其特征在于,所述预设规则为字典序顺序,对于每个目录块,所述第二预设位置的文件名为该目录块的首个文件名;
    所述当p=1时,根据所述当前第三集合包括的文件名确定所述目标目录块,具体包括:
    若所述待访问文件名的特征值大于所述当前第三集合中的文件名的特征值,则将所述当前第三集合中的文件名归属的目录块确定为所述目标目录块;
    或者,
    若所述待访问文件名的特征值小于所述当前第三集合中的文件名的特征值,则将位于所述当前第三集合中的文件名之前且与所述当前第三集合中的文件名相邻的文件名归属的目录块确定为所述目标目录块。
  8. 根据权利要求6或7所述的数据查询方法,其特征在于,所述预设规则为字典序顺序;
    所述根据所述当前第三集合、所述当前第四集合和所述二分查找算法,重新确定 第三集合和第四集合,具体包括:
    若所述待访问文件名的特征值小于所述第六文件名的特征值,确定所述重新确定的第三集合包括所述当前第三集合中位于所述第六文件名之前的所有文件名,所述重新确定的第四集合包括所述第四文件名、所述当前第三集合中位于所述第六文件名之前的所有文件名以及所述第六文件名;
    或者,
    若所述待访问文件名的特征值大于所述第六文件名的特征值,确定所述重新确定的第三集合包括所述当前第三集合中位于所述第六文件名之后的所有文件名,所述重新确定的第四集合包括所述第六文件名、所述当前第三集合中位于所述第六文件名之后的所有文件名以及所述第五文件名。
  9. 根据权利要求5所述的数据查询方法,其特征在于,所述预设规则为字典序顺序,对于每个目录块,所述第二预设位置的文件名为该目录块的首个文件名,所述数据查询方法还包括:
    若所述第六文件名为所述当前第三集合中的首个文件名,所述待访问文件名的特征值小于所述第六文件名的特征值,则将位于所述第六文件名之前且与所述第六文件名相邻的文件名归属的目录块确定为所述目标目录块;
    若所述第六文件名为所述当前第三集合中的最后一个文件名,所述待访问文件名的特征值大于所述第六文件名的特征值,则将所述第六文件名归属的目录块确定为所述目标目录块。
  10. 根据权利要求6-9中任意一项所述的数据查询方法,其特征在于,所述确定所述待访问文件名与所述当前第四集合中的文件名之间的第二公共前缀,具体包括:
    确定所述待访问文件名与所述第四文件名之间共有的第三前缀;
    确定所述待访问文件名与所述第五文件名之间共有的第四前缀;
    将所述第三前缀与所述第四前缀中长度最小的一个确定为所述第二公共前缀。
  11. 根据权利要求1-4中任意一项所述的数据查询方法,其特征在于,所述n个目录块按照所述预设规则顺序排列,并采用完全二叉树方式存储,所述从所述n个目录块中确定所述目标目录块,具体包括:
    步骤A3:确定当前候选目录块和当前第三公共前缀;
    步骤B3:从所述当前第三公共前缀之后的首位字符起,逐字符对比待访问文件名与第i个文件名,所述第i个文件名为所述n个目录块中第i个目录块中第四预设位置的文件名,0≤i<n;
    若所述待访问文件名与所述第i个文件名相同,则重新确定候选目录块为所述第i个文件名归属的目录块,并将重新确定的候选目录块确定为所述目标目录块。
  12. 根据权利要求11所述的数据查询方法,其特征在于,所述数据查询方法还包括:
    若所述待访问文件名与所述第i个文件名不同,则重新确定第三公共前缀、候选目录块以及所述第i个文件名,重新确定的第i个文件名为所述n个目录块中第j个目录块中所述第四预设位置的文件名;其中,若所述待访问文件名的特征值大于所述第i个文件名的特征值,j=2i+2;若所述待访问文件名的特征值小于所述第i个文件名的 特征值,j=2i+1,0≤i<j<n;
    根据重新确定的第三公共前缀、重新确定的候选目录块以及所述重新确定的第i个文件名,重新执行所述步骤B3,直到确定出所述目标目录块。
  13. 根据权利要求12所述的数据查询方法,其特征在于,所述预设规则顺序为字典序,所述第四预设位置的文件名为对应目录块中的首个文件名,所述若所述待访问文件名与所述第i个文件名不同,则重新确定候选目录块,具体包括:
    若所述待访问文件名的特征值大于所述第i个文件名的特征值,则确定所述重新确定的候选目录块为所述第i个文件名归属的目录块;
    若所述待访问文件名的特征值小于所述第i个文件名的特征值,则确定所述重新确定的候选目录块为所述当前候选目录块。
  14. 根据权利要求12或13所述的数据查询方法,其特征在于,所述预设规则顺序为字典序,所述第四预设位置的文件名为对应目录块中的首个文件名,所述若所述待访问文件名与所述第i个文件名不同,则重新确定第三公共前缀,具体包括:
    当所述待访问文件名的特征值大于所述第i个文件名的特征值时,将当前第一目标前缀更新为所述待访问文件名与所述第i个文件名之间共有的前缀;将更新后的第一目标前缀与当前第二目标前缀中长度最小的一个确定为所述重新确定的第三公共前缀;
    或者,
    当所述待访问文件名的特征值小于所述第i个文件名的特征值时,将所述当前第二目标前缀更新为所述待访问文件名与所述第i个文件名之间共有的前缀;将所述当前第一目标前缀与更新后的第二目标前缀中长度最小的一个确定为所述重新确定的第三公共前缀;
    其中,所述第一目标前缀的长度和所述第二目标前缀的长度的初始值均为零,且所述第一目标前缀的长度和所述第二目标前缀的长度随着所述待访问文件名的特征值与所述第i个文件名的特征值的大小关系发生变化。
  15. 一种只读文件系统,其特征在于,所述只读文件系统的对象包括目录文件,所述目录文件由n个目录块组成,每个目录块均包括目录项区域和文件名区域,所述目录项区域包括至少一个目录项,所述文件名区域包括至少一个文件名;
    对应同一目录块而言,该目录块中目录项的数量与文件名的数量相同,且该目录块中的所有目录项以及所有文件名均按照预设规则顺序排列。
  16. 根据权利要求15所述的只读文件系统,其特征在于,
    所述至少一个目录项中的每个目录项均包括索引号、文件类型和与该目录项对应的文件名在所归属的目录块的偏移量;
    所述文件名区域与所述目录项区域相邻,且所述文件名区域位于所述目录项区域之后。
  17. 一种数据查询装置,其特征在于,所述数据查询装置具备如权利要求15或16所述的只读文件系统,所述数据查询装置包括:
    处理单元,用于从所述只读文件系统的n个目录块中确定目标目录块,所述目标目录块的目录项区域包括m个目录项,所述目标目录块的文件名区域包括m个文件名, 所述m个目录项与所述m个文件名一一对应,所述m个目录项和所述m个文件名均按照预设规则顺序排列,所述待访问文件名位于文件名范围中,所述文件名范围是由所述目标目录块的首个文件名与所述目标目录块的最后一个文件名组成的范围,m≥1;
    所述处理单元,还用于执行步骤A1、步骤B1以及步骤C1;其中,
    所述步骤A1为:根据二分查找算法和目标目录块,确定当前第一集合和当前第二集合,所述当前第一集合包括所述m个文件名中连续的x个文件名,所述当前第二集合包括所述x个文件名、第一文件名以及第二文件名,所述第一文件名为排列于所述x个文件名中的首个文件名之前且与所述x个文件名中的首个文件名相邻的文件名,所述第二文件名为排列于所述x个文件名中的最后一个文件名之后且与所述x个文件名中的最后一个文件名相邻的文件名,m≥x≥1;
    所述步骤B1为:确定所述待访问文件名与所述当前第二集合中的文件名之间的第一公共前缀;
    所述步骤C1为:从所述第一公共前缀之后的首位字符起,逐字符对比所述待访问文件名与第三文件名;其中,所述第三文件名为所述当前第一集合中第一预设位置的文件名;
    获取单元,用于若所述处理单元判断出所述待访问文件名与所述第三文件名相同,则根据与所述第三文件名对应的目录项获取待访问文件的数据。
  18. 根据权利要求17所述的数据查询装置,其特征在于,
    所述处理单元,还用于若所述待访问文件名与所述第三文件名不同,则根据所述二分查找算法、所述当前第一集合和所述当前第二集合,重新确定第一集合和第二集合,并根据重新确定的第一集合和重新确定的第二集合,执行所述步骤B1和所述步骤C1,直到所述获取单元获取到所述待访问文件的数据或所述处理单元确定所述目标目录块未包括所述待访问文件名。
  19. 根据权利要求18所述的数据查询装置,其特征在于,所述预设规则为字典序顺序,所述处理单元具体用于:
    若所述待访问文件名的特征值小于所述第三文件名的特征值,确定所述重新确定的第一集合包括所述当前第一集合中位于所述第三文件名之前的所有文件名,所述重新确定的第二集合包括所述第一文件名、所述当前第一集合中位于所述第三文件名之前的所有文件名以及所述第三文件名;
    或者,
    若所述待访问文件名的特征值大于所述第三文件名的特征值,确定所述重新确定的第一集合包括所述当前第一集合中位于所述第三文件名之后的所有文件名,所述重新确定的第二集合包括所述第三文件名、所述当前第一集合中位于所述第三文件名之后的所有文件名以及所述第二文件名。
  20. 根据权利要求17-19中任意一项所述的数据查询装置,其特征在于,所述处理单元具体用于:
    确定所述待访问文件名与所述第一文件名之间共有的第一前缀;
    确定所述待访问文件名与所述第二文件名之间共有的第二前缀;
    将所述第一前缀与所述第二前缀中长度最小的一个确定为所述第一公共前缀。
  21. 根据权利要求17-20中任意一项所述的数据查询装置,其特征在于,
    所述处理单元,还用于执行步骤A2、步骤B2以及步骤C2;其中,
    所述步骤A2为:根据所述二分查找算法和所述n个目录块,确定当前第三集合和当前第四集合;其中,所述当前第三集合包括p个文件名,所述p个文件名包括p个目录块中每个目录块中第二预设位置的文件名,且所述当前第三集合中的文件名按照所述预设规则顺序排列,所述p个目录块为所述n个目录块中连续的目录块,所述当前第四集合包括所述p个文件名、第四文件名以及第五文件名,所述第四文件名为排列于所述p个文件名中的首个文件名之前且与所述p个文件名中的首个文件名相邻的文件名,所述第五文件名为排列于所述p个文件名中的最后一个文件名之后且与所述p个文件名中的最后一个文件名相邻的文件名,1≤p≤n;
    所述步骤B2为:确定所述待访问文件名与所述当前第四集合中的文件名之间的第二公共前缀;
    所述步骤C2为:从所述第二公共前缀之后的首位字符起,逐字符对比所述待访问文件名与第六文件名;其中,所述第六文件名为所述当前第三集合中第三预设位置的文件名;
    所述处理单元,还用于若所述待访问文件名与所述第六文件名相同,则确定所述第六文件名归属的目录块为所述目标目录块。
  22. 根据权利要求21所述的数据查询装置,其特征在于,所述处理单元具体用于:
    当2≤p≤n时,根据所述当前第三集合、所述当前第四集合和所述二分查找算法,重新确定第三集合和第四集合,并根据重新确定的第三集合和重新确定的第四集合,执行所述步骤B2和所述步骤C2;
    当p=1时,根据所述当前第三集合包括的文件名确定所述目标目录块。
  23. 根据权利要求22所述的数据查询装置,其特征在于,所述预设规则为字典序顺序,对于每个目录块,所述第二预设位置的文件名为该目录块的首个文件名;当p=1时,所述处理单元具体用于:
    若所述待访问文件名的特征值大于所述当前第三集合中的文件名的特征值,则将所述当前第三集合中的文件名归属的目录块确定为所述目标目录块;
    或者,
    若所述待访问文件名的特征值小于所述当前第三集合中的文件名的特征值,则将位于所述当前第三集合中的文件名之前且与所述当前第三集合中的文件名相邻的文件名归属的目录块确定为所述目标目录块。
  24. 根据权利要求22或23所述的数据查询装置,其特征在于,所述预设规则为字典序顺序;所述处理单元具体用于:
    若所述待访问文件名的特征值小于所述第六文件名的特征值,确定所述重新确定的第三集合包括所述当前第三集合中位于所述第六文件名之前的所有文件名,所述重新确定的第四集合包括所述第四文件名、所述当前第三集合中位于所述第六文件名之前的所有文件名以及所述第六文件名;
    或者,
    若所述待访问文件名的特征值大于所述第六文件名的特征值,确定所述重新确定 的第三集合包括所述当前第三集合中位于所述第六文件名之后的所有文件名,所述重新确定的第四集合包括所述第六文件名、所述当前第三集合中位于所述第六文件名之后的所有文件名以及所述第五文件名。
  25. 根据权利要求21所述的数据查询装置,其特征在于,所述预设规则为字典序顺序,对于每个目录块,所述第二预设位置的文件名为该目录块的首个文件名,所述处理单元还用于:
    若所述第六文件名为所述当前第三集合中的首个文件名,所述待访问文件名的特征值小于所述第六文件名的特征值,则将位于所述第六文件名之前且与所述第六文件名相邻的文件名归属的目录块确定为所述目标目录块;
    若所述第六文件名为所述当前第三集合中的最后一个文件名,所述待访问文件名的特征值大于所述第六文件名的特征值,则将所述第六文件名归属的目录块确定为所述目标目录块。
  26. 根据权利要求22-25中任意一项所述的数据查询装置,其特征在于,所述处理单元具体用于:
    确定所述待访问文件名与所述第四文件名之间共有的第三前缀;
    确定所述待访问文件名与所述第五文件名之间共有的第四前缀;
    将所述第三前缀与所述第四前缀中长度最小的一个确定为所述第二公共前缀。
  27. 根据权利要求17-20中任意一项所述的数据查询装置,其特征在于,所述n个目录块按照所述预设规则顺序排列,并采用完全二叉树方式存储;
    所述处理单元,还用于执行步骤A3和步骤B3;其中,
    所述步骤A3为:确定当前候选目录块和当前第三公共前缀;
    所述步骤B3为:从所述当前第三公共前缀之后的首位字符起,逐字符对比待访问文件名与第i个文件名,所述第i个文件名为所述n个目录块中第i个目录块中第四预设位置的文件名,0≤i<n;
    所述处理单元,还用于若所述待访问文件名与所述第i个文件名相同,则重新确定候选目录块为所述第i个文件名归属的目录块,并将重新确定的候选目录块确定为所述目标目录块。
  28. 根据权利要求27所述的数据查询装置,其特征在于,
    所述处理单元,还用于若所述待访问文件名与所述第i个文件名不同,则重新确定第三公共前缀、候选目录块以及所述第i个文件名,重新确定的第i个文件名为所述n个目录块中第j个目录块中所述第四预设位置的文件名;其中,若所述待访问文件名的特征值大于所述第i个文件名的特征值,j=2i+2;若所述待访问文件名的特征值小于所述第i个文件名的特征值,j=2i+1,0≤i<j<n;
    所述处理单元,还用于根据重新确定的第三公共前缀、重新确定的候选目录块以及所述重新确定的第i个文件名,重新执行所述步骤B3,直到确定出所述目标目录块。
  29. 根据权利要求28所述的数据查询装置,其特征在于,所述预设规则顺序为字典序,所述第四预设位置的文件名为对应目录块中的首个文件名;所述处理单元具体用于:
    若所述待访问文件名的特征值大于所述第i个文件名的特征值,则确定所述重新 确定的候选目录块为所述第i个文件名归属的目录块;
    若所述待访问文件名的特征值小于所述第i个文件名的特征值,则确定所述重新确定的候选目录块为所述当前候选目录块。
  30. 根据权利要求28或29所述的数据查询装置,其特征在于,所述预设规则顺序为字典序,所述第四预设位置的文件名为对应目录块中的首个文件名;所述处理单元具体用于:
    当所述待访问文件名的特征值大于所述第i个文件名的特征值时,将当前第一目标前缀更新为所述待访问文件名与所述第i个文件名之间共有的前缀;将更新后的第一目标前缀与当前第二目标前缀中长度最小的一个确定为所述重新确定的第三公共前缀;
    或者,
    当所述待访问文件名的特征值小于所述第i个文件名的特征值时,将所述当前第二目标前缀更新为所述待访问文件名与所述第i个文件名之间共有的前缀;将所述当前第一目标前缀与更新后的第二目标前缀中长度最小的一个确定为所述重新确定的第三公共前缀;
    其中,所述第一目标前缀的长度和所述第二目标前缀的长度的初始值均为零,且所述第一目标前缀的长度和所述第二目标前缀的长度随着所述待访问文件名的特征值与所述第i个文件名的特征值的大小关系发生变化。
  31. 一种终端,其特征在于,所述终端包括:一个或多个处理器、存储器、通信接口;
    所述存储器、所述通信接口与所述一个或多个处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括指令,当所述一个或多个处理器执行所述指令时,所述终端执行如权利要求1-14中任意一项所述的数据查询方法。
  32. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,其特征在于,当所述指令在终端上运行时,使得所述终端执行如权利要求1-14中任意一项所述的数据查询方法。
  33. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在终端上运行时,使得所述终端执行如权利要求1-14中任意一项所述的数据查询方法。
PCT/CN2018/075300 2018-02-05 2018-02-05 一种数据查询方法及装置 WO2019148497A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US16/967,660 US11507533B2 (en) 2018-02-05 2018-02-05 Data query method and apparatus
CN201880036991.5A CN110709824B (zh) 2018-02-05 2018-02-05 一种数据查询方法及装置
PCT/CN2018/075300 WO2019148497A1 (zh) 2018-02-05 2018-02-05 一种数据查询方法及装置
EP18903433.3A EP3736705B1 (en) 2018-02-05 2018-02-05 Date query method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/075300 WO2019148497A1 (zh) 2018-02-05 2018-02-05 一种数据查询方法及装置

Publications (1)

Publication Number Publication Date
WO2019148497A1 true WO2019148497A1 (zh) 2019-08-08

Family

ID=67479490

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075300 WO2019148497A1 (zh) 2018-02-05 2018-02-05 一种数据查询方法及装置

Country Status (4)

Country Link
US (1) US11507533B2 (zh)
EP (1) EP3736705B1 (zh)
CN (1) CN110709824B (zh)
WO (1) WO2019148497A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949610B (zh) * 2020-09-18 2022-12-23 苏州浪潮智能科技有限公司 一种基于ai训练平台的海量文件检索方法、装置及设备
US11720557B2 (en) * 2021-04-07 2023-08-08 Druva Inc. System and method for on-demand search of a large dataset

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130159250A1 (en) * 2009-01-08 2013-06-20 International Business Machines Corporation Method, Apparatus and Computer Program Product For Maintaining File System Client Directory Caches With Parallel Directory Writes
CN103942205A (zh) * 2013-01-18 2014-07-23 深圳市腾讯计算机系统有限公司 存储、读取目录索引的方法、装置及系统
CN104050251A (zh) * 2014-06-11 2014-09-17 深圳市茁壮网络股份有限公司 一种文件管理方法及管理系统
CN104537016A (zh) * 2014-12-18 2015-04-22 华为技术有限公司 一种确定文件所在分区的方法及装置

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4453217A (en) * 1982-01-04 1984-06-05 Bell Telephone Laboratories, Incorporated Directory lookup method and apparatus
EP0389399A3 (en) * 1989-03-20 1993-01-20 International Business Machines Corporation Directory structure for worm optical media
US5371885A (en) * 1989-08-29 1994-12-06 Microsoft Corporation High performance file system
US5333315A (en) * 1991-06-27 1994-07-26 Digital Equipment Corporation System of device independent file directories using a tag between the directories and file descriptors that migrate with the files
US5963962A (en) * 1995-05-31 1999-10-05 Network Appliance, Inc. Write anywhere file-system layout
ATE195825T1 (de) * 1993-06-03 2000-09-15 Network Appliance Inc Anordnung eines dateisystems zum beschreiben beliebiger bereiche
US5907672A (en) * 1995-10-04 1999-05-25 Stac, Inc. System for backing up computer disk volumes with error remapping of flawed memory addresses
JPH11161533A (ja) * 1997-11-25 1999-06-18 Nec Micom Technology Kk ファイル管理方法、ファイルシステムおよび記録媒体
JP2000357115A (ja) * 1999-06-15 2000-12-26 Nec Corp ファイル検索装置及びファイル検索方法
US6973542B1 (en) * 2000-07-18 2005-12-06 International Business Machines Corporation Detecting when to prefetch inodes and then prefetching inodes in parallel
GB2369465B (en) 2000-11-28 2003-04-02 3Com Corp A method of sorting and retrieving data files
US7509322B2 (en) * 2001-01-11 2009-03-24 F5 Networks, Inc. Aggregated lock management for locking aggregated files in a switched file system
US7058783B2 (en) * 2002-09-18 2006-06-06 Oracle International Corporation Method and mechanism for on-line data compression and in-place updates
US8522205B2 (en) * 2004-05-18 2013-08-27 Oracle International Corporation Packaging multiple groups of read-only files of an application's components into multiple shared libraries
GB2415797B (en) 2004-06-24 2009-02-25 Symbian Software Ltd A method for improving the performance of a file system in a computer device
US8321439B2 (en) * 2004-12-17 2012-11-27 Microsoft Corporation Quick filename lookup using name hash
US7669003B2 (en) * 2005-08-03 2010-02-23 Sandisk Corporation Reprogrammable non-volatile memory systems with indexing of directly stored data files
WO2007019217A1 (en) 2005-08-03 2007-02-15 Sandisk Corporation Nonvolatile memory with block management
US7904492B2 (en) * 2006-03-23 2011-03-08 Network Appliance, Inc. Method and apparatus for concurrent read-only access to filesystem
CN100485681C (zh) * 2006-03-23 2009-05-06 北京握奇数据系统有限公司 智能卡存储系统及该系统中文件创建管理的方法
CN101211338A (zh) 2006-12-29 2008-07-02 上海欣泰通信技术有限公司 快速文件预分配与文件箱动态管理方法
US7917479B2 (en) * 2007-03-20 2011-03-29 Micron Technology, Inc. Non-volatile memory devices, systems including same and associated methods
US8156164B2 (en) * 2007-07-11 2012-04-10 International Business Machines Corporation Concurrent directory update in a cluster file system
CN102024019B (zh) 2010-11-04 2013-03-13 曙光信息产业(北京)有限公司 一种分布式文件系统中基于后缀树的目录组织方法
US8918621B1 (en) * 2011-09-29 2014-12-23 Emc Corporation Block address isolation for file systems
CN102385623B (zh) 2011-10-25 2013-08-28 曙光信息产业(北京)有限公司 一种分布式文件系统中目录的存取方法
KR101977575B1 (ko) * 2012-09-28 2019-05-13 삼성전자 주식회사 디렉토리 엔트리 조회 장치, 그 방법 및 디렉토리 엔트리 조회 프로그램이 기록된 기록 매체
CN103473337A (zh) 2013-09-22 2013-12-25 北京航空航天大学 一种分布式存储系统中处理面向海量目录和文件的方法
CN103870588B (zh) 2014-03-27 2016-08-31 杭州朗和科技有限公司 一种在数据库中使用的方法及装置
US9767104B2 (en) * 2014-09-02 2017-09-19 Netapp, Inc. File system for efficient object fragment access
CN105701096A (zh) 2014-11-25 2016-06-22 腾讯科技(深圳)有限公司 索引生成方法、数据查询方法、装置及系统
CN105830059B (zh) * 2014-11-28 2019-09-27 华为技术有限公司 文件访问方法、装置及存储设备
JP6525804B2 (ja) * 2015-08-07 2019-06-05 キヤノン株式会社 情報処理装置、情報処理方法、及びプログラム
CN106649401A (zh) 2015-11-03 2017-05-10 阿里巴巴集团控股有限公司 分布式文件系统中的数据写入方法和装置
US10318649B2 (en) * 2017-04-18 2019-06-11 International Business Machines Corporation Implementing a secondary storage dentry cache
CN111512290B (zh) * 2017-12-27 2023-09-22 华为技术有限公司 文件页表管理技术

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130159250A1 (en) * 2009-01-08 2013-06-20 International Business Machines Corporation Method, Apparatus and Computer Program Product For Maintaining File System Client Directory Caches With Parallel Directory Writes
CN103942205A (zh) * 2013-01-18 2014-07-23 深圳市腾讯计算机系统有限公司 存储、读取目录索引的方法、装置及系统
CN104050251A (zh) * 2014-06-11 2014-09-17 深圳市茁壮网络股份有限公司 一种文件管理方法及管理系统
CN104537016A (zh) * 2014-12-18 2015-04-22 华为技术有限公司 一种确定文件所在分区的方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PANG, HAIFEI ET AL.: "Research on Improvement of File Retrieval Method Based on Linux", FIRE CONTROL & COMMAND CONTROL, vol. 42, no. 2, 28 February 2017 (2017-02-28), pages 145 - 148, XP055627825 *
See also references of EP3736705A4 *

Also Published As

Publication number Publication date
US20210224225A1 (en) 2021-07-22
CN110709824B (zh) 2022-01-14
CN110709824A (zh) 2020-01-17
EP3736705A4 (en) 2020-12-23
EP3736705B1 (en) 2024-09-18
US11507533B2 (en) 2022-11-22
EP3736705A1 (en) 2020-11-11

Similar Documents

Publication Publication Date Title
US10810179B2 (en) Distributed graph database
CN110275864B (zh) 索引建立方法、数据查询方法及计算设备
CN108897761B (zh) 一种聚簇存储方法及装置
WO2018064962A1 (zh) 数据存储方法、电子设备和计算机非易失性存储介质
US9367640B2 (en) Method and system for creating linked list, method and system for searching data
US9292567B2 (en) Bulk matching with update
WO2018205151A1 (zh) 数据更新方法和存储装置
CN106599091B (zh) 基于键值存储的rdf图结构存储和索引方法
CN111125120B (zh) 一种面向流数据的快速索引方法、装置、设备及存储介质
JP2021089704A (ja) データ照会方法、装置、電子機器、可読記憶媒体、及びコンピュータープログラム
CN110442773A (zh) 分布式系统中节点缓存方法、系统、装置及计算机介质
WO2019148497A1 (zh) 一种数据查询方法及装置
Roumelis et al. Parallel processing of spatial batch-queries using xBR+-trees in solid-state drives
JPWO2007020849A1 (ja) 共有メモリ型マルチプロセッサシステム及びその情報処理方法
US10558636B2 (en) Index page with latch-free access
CN113297432B (zh) 用于分区拆分与合并的方法、处理器可读介质和系统
CN103902693A (zh) 一种读优化的内存数据库t树索引结构的方法
CN111143373A (zh) 数据处理方法、装置、电子设备及存储介质
KR101081726B1 (ko) GPU를 이용한 R-tree에서의 범위 질의의 병렬 처리 방법
Yao et al. NV-QALSH: an nvm-optimized implementation of query-aware locality-sensitive hashing
CN115221360A (zh) 树形结构配置方法和系统
US9824105B2 (en) Adaptive probabilistic indexing with skip lists
CN108733678B (zh) 一种数据搜索的方法、装置和相关设备
Ge et al. Cinhba: A secondary index with hotscore caching policy on key-value data store
WO2023141987A1 (zh) 文件读取方法和装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018903433

Country of ref document: EP

Effective date: 20200805