CN113297138A - Index establishing method, data query method and computing device - Google Patents

Index establishing method, data query method and computing device Download PDF

Info

Publication number
CN113297138A
CN113297138A CN202110670882.2A CN202110670882A CN113297138A CN 113297138 A CN113297138 A CN 113297138A CN 202110670882 A CN202110670882 A CN 202110670882A CN 113297138 A CN113297138 A CN 113297138A
Authority
CN
China
Prior art keywords
index
directory
data object
file system
linear
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110670882.2A
Other languages
Chinese (zh)
Inventor
张磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Deepin Technology Co ltd
Original Assignee
Wuhan Deepin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Deepin Technology Co ltd filed Critical Wuhan Deepin Technology Co ltd
Priority to CN202110670882.2A priority Critical patent/CN113297138A/en
Publication of CN113297138A publication Critical patent/CN113297138A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Abstract

The invention discloses an index establishing method, which is executed in computing equipment and is suitable for establishing an index for a file system, wherein the file system comprises a plurality of data objects, and the data objects comprise directories and files, and the method comprises the following steps: establishing a linear index of a file system according to the hierarchical relationship of a plurality of data objects, wherein the linear index comprises names and index information of the data objects of all the hierarchies which are stored sequentially, the index information of any directory comprises a start tag, and the start tag stores the position information of a first subdata object in the linear index; the index information of the last child data object under any directory includes an end tag, and the end tag stores the position information of the directory in the linear index. The invention also discloses a corresponding data searching method and a corresponding computing device.

Description

Index establishing method, data query method and computing device
The application is a divisional application of invention patent application 2019105005057 filed on 6, 11 and 2019
Technical Field
The invention relates to the technical field of data storage and query, in particular to an index establishing method, a data query method and computing equipment.
Background
Finding files based on File names is a common function of File Systems (FS). At present, the following two schemes are mainly used for searching files:
1. and according to the file name and the specified directory input by the user, recursively traversing the subdirectories and the files under the specified directory in real time, and respectively judging whether each file is matched with the file name. The method has accurate searching result but takes longer time.
2. And traversing the whole file system at regular time, establishing indexes for each subdirectory and each file, and searching the files through the established indexes. The method can accelerate the searching speed, but the index occupies larger storage space, the time consumption for establishing the index is longer, and the real-time property of the file query result is difficult to ensure.
Disclosure of Invention
To this end, the present invention provides an index building method, a data query method and a computing device in an attempt to solve or at least alleviate the above-existing problems.
According to a first aspect of the present invention, there is provided an index building method, executed in a computing device, adapted to build an index for a file system, the file system comprising a plurality of data objects, the plurality of data objects comprising, the method comprising: establishing a linear index of the file system according to the hierarchical relationship of the plurality of data objects, wherein the linear index comprises names and index information of the data objects of all the hierarchies which are stored sequentially, the index information of any directory comprises a start tag, and the start tag stores the position information of a first subdata object in the directory in the linear index; the index information of the last child data object under any directory includes an end tag, and the end tag stores the position information of the directory in the linear index.
According to a second aspect of the present invention, there is provided a data query method, executed in a computing device, the method being adapted to find a data object including a specified query word in a name in a file system, the file system building a linear index according to the aforementioned index building method, the method comprising: acquiring a query word; searching a target data object with a name comprising the query word in the linear index; determining a target directory to which the target data object belongs according to a first end tag located after the target data object in the linear index; and determining a storage path of the target data object according to the target directory, and returning the storage path to the user.
According to a third aspect of the invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions that, when read and executed by the processor, cause the computing device to perform the index building method or the data query method as described above.
According to a fourth aspect of the present invention, there is provided a readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the index building method or the data query method as described above.
According to the index establishing scheme, the linear index is established for the file system according to the hierarchical relation between the data objects, the storage space occupied by the index file is reduced by the linear index, and the miniaturization of the index file is realized. The data query scheme of the invention carries out data query based on the established linear index, can greatly improve the query speed of the data object, ensures the real-time performance of the data query and realizes high-efficiency data query.
Furthermore, the technical scheme of the invention also establishes the inverted index of the data object name, determines the target data object with the name including the appointed query word through the inverted index, and determines the storage path of the target data object according to the linear index, thereby further improving the query speed.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a schematic diagram of a computing device 100, according to one embodiment of the invention;
FIG. 2 illustrates a flow diagram of an index building method 200 according to one embodiment of the invention;
FIG. 3 shows a schematic diagram of a file system structure according to one embodiment of the invention;
FIG. 4 is a diagram illustrating a linear index corresponding to the file system structure shown in FIG. 3;
FIG. 5 shows a flow diagram of an index building method 200 according to another embodiment of the invention;
FIG. 6 illustrates a diagram of storing an inverted index using a hash chain according to one embodiment of the invention;
FIG. 7 is a diagram illustrating an index update and query process according to one embodiment of the invention;
FIG. 8 shows a flow diagram of a data query method 800 according to one embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Aiming at the defects of the prior art, the invention provides an index establishing scheme and a data query scheme. The index establishing scheme of the invention establishes the linear index for the file system according to the hierarchical relation between the data objects, the linear index reduces the storage space occupied by the index file, and the miniaturization of the index file is realized. The data query scheme of the invention carries out data query based on the established linear index, can greatly improve the query speed of the data object, ensures the real-time performance of the data query and realizes high-efficiency data query.
The index establishing method and the data query method are executed in the computing equipment. The computing device may be any device having storage and computing capabilities, and may be, for example, a personal computer such as a desktop computer and a notebook computer, a computer with a higher hardware configuration such as a workstation and a server, or a mobile terminal such as a mobile phone, a tablet computer, and a smart wearable device, but is not limited thereto.
FIG. 1 shows a schematic diagram of a computing device 100, according to one embodiment of the invention. It should be noted that the computing device 100 shown in fig. 1 is only an example, and in practice, the computing device for implementing the index establishing method and the data querying method of the present invention may be any type of device, and the hardware configuration thereof may be the same as or different from that of the computing device 100 shown in fig. 1. In practice, the computing device for implementing the index establishing method and the data query method of the present invention may add or delete hardware components of the computing device 100 shown in fig. 1, and the present invention does not limit the specific hardware configuration of the computing device.
As shown in FIG. 1, in a basic configuration 102, a computing device 100 typically includes a system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more programs 122, and program data 124. In some implementations, the program 122 can be arranged to execute instructions on an operating system by one or more processors 104 using program data 124.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
In a computing device 100 according to the present invention, the application 122 includes instructions for performing the index building method 200 and the data query method 800 of the present invention, which may instruct the processor 104 to perform the index building method 200 and the data query method 800 of the present invention, build a linear index for the file system, and implement efficient data query according to the linear index.
FIG. 2 shows a flow diagram of an index building method 200 according to one embodiment of the invention. Method 200 is performed in a computing device (e.g., computing device 100 described above) for establishing a linear index for a file system in the computing device.
A file system is a method and data structure used by an operating system to specify files on a storage device (usually a disk, but also a NAND Flash based solid state disk) or partition, i.e., a method of organizing files on a storage device. In other words, a file system is software in an operating system that is responsible for managing and storing file information. File systems typically employ a hierarchical structure (or referred to as a tree structure) to store files, where each level in the file system includes one or more nodes, which may be directories or files. The file may be of any type, i.e., the extension (or called suffix) of the file may be of any value, e.g., txt,. doc,. xls, etc., and the invention is not limited to the type of file stored in the file system. For convenience of description, in the embodiments of the present invention, directories and files are collectively referred to as data objects, that is, a file system includes a plurality of data objects including directories and files. In some embodiments, the data objects also include links (e.g., soft links, hard links) that are used to enable sharing of files. Since the links are a mapping of the source directory or source file, in embodiments of the present invention, the index build and data query processes for both directory and file types of data objects are emphasized. The process of index establishment and data query of the linked data object is not described in detail, and those skilled in the art can refer to the index establishment and data query process of the directory and the file and apply to the linked data object by analogy.
As shown in fig. 2, the method 200 begins at step S210.
In step S210, a linear index of the file system is established according to the hierarchical relationship of the plurality of data objects, and the linear index includes names and index information of the data objects of each hierarchy stored sequentially. The index information of any directory comprises a start tag, and the start tag stores the position information of a first subdata object in the linear index under the directory; the index information of the last child data object under any directory includes an end tag, and the end tag stores the position information of the directory in the linear index.
In order to build a linear index of the file system, the tree-like file system needs to be traversed step by step. Specifically, starting from a first hierarchy (e.g., a root directory), data objects of each hierarchy are traversed in sequence, and names and index information of the data objects are successively stored in a storage area.
In an embodiment of the present invention, each data object in the file system has index information, and the index information is used to identify the membership of the data object in the file system, so as to facilitate the location of the data object, i.e. to determine the storage path of the data object in the file system.
As previously described, the index information includes a start tag and an end tag. Wherein, the start tag is an index information of the directory-type data object, which is used to store the position information of the first sub-data object in the directory-type data object in the linear index. The location information may be, for example, but not limited to, an address offset of the data object in the linear index. In particular, when a directory is an empty directory (i.e., the directory does not include any child data objects), a predetermined special identifier, such as, but not limited to, NULL, is stored in the start tag of the directory. Based on the start tag, the data objects comprised by a certain directory can be conveniently located in the linear index.
The end tag is an index information of the last child data object under any directory, and is used for storing the position information of the directory in the linear index. The last child data object may be a directory or a file. The location information may be, for example, but not limited to, an address offset in the linear index of the parent directory to which the child data object belongs. Based on the end tag, the parent directory to which a data object belongs can be conveniently located in the linear index.
For example, the directory d1 includes a file f1, a file f2 and a sub-directory d3, and accordingly names and index information of the directory d1, the file f1, the file f2 and the sub-directory d3 are stored in a linear index in sequence. It will be understood by those skilled in the art that since the linear indexes are arranged in the hierarchical order of the file system, in the linear index, the name and index information of the file f1, the name and index information of the file f2, and the name and index information of the subdirectory d3 are sequentially stored adjacent to each other, but the index information of the directory d1 and the name of the file f1 are not necessarily stored adjacent to each other. The index information of the directory d1 includes a start tag, in which the address offset of the first child data object included therein, i.e. the file f1, in the linear index, i.e. the address offset of the name of the file f1 in the linear index, is stored. The subdirectory d3 is the last child data object in the directory d1, and the index information of the subdirectory d3 includes an end tag, and the end tag stores the address offset of the directory d1 in the linear index, that is, the address offset of the name of the directory d1 in the linear index.
According to an embodiment, the index information comprises, in addition to the start tag and the end tag described above, a type identification of the data object. It should be noted that, the present invention does not limit the type identifiers corresponding to the different types of data objects, as long as the type identifiers of the different types of data objects are different. For example, the type of the data object includes a file and a directory, and in order to further save the storage space of the linear index, a preset character (e.g. a lower case letter't') may be used to identify the file type data object, and no identification is made for the directory type data object.
According to an embodiment, the index information further includes an end identifier, and specifically, the index information of the last child data object under each directory includes an end identifier for indicating the end of the contents of the directory. The ending identifier may be a character string of any length, and the invention is not limited to the specific content of the ending identifier. In one embodiment, the end identifier may be set to the character '0'. For example, the directory d1 includes a file f1, a file f2 and a sub-directory d3, and accordingly names and index information of the directory d1, the file f1, the file f2 and the sub-directory d3 are stored in a linear index in sequence. The subdirectory d3 is the last child data object in the directory d1, and the index information of the subdirectory d3 further includes an end identifier '0'.
The index information includes a plurality of information items, the index information of each data object does not necessarily include all the information items, and the number and types of the information items included in the index information of different data objects may be different. For example, in the foregoing embodiment, the index information includes four types of start tag, end tag, type identifier, and end identifier, and the index information of a directory-type data object includes the start tag but not necessarily the end tag (only if the directory is the last child data object in its parent directory at the same time, the directory has the end tag).
In addition, although the number and the type of the information items included in the index information of different data objects are not necessarily the same, the storage space occupied by each information item is fixed, for example, the start tag and the end tag store fixed-length (for example, four-byte) address offsets, and the type identifier and the end identifier are fixed-length (for example, one-byte) identifiers. Thus, when traversing the tree file system layer by layer to generate a linear index, the index information item included in the data object can be determined according to the type and the membership of the data object, a storage space is allocated to each data object, and a corresponding value is written into the storage space.
In the linear index of the present invention, adjacent data objects, i.e., files in the same directory, are stored adjacently, which is advantageous for memory access optimization. In addition, linear storage is also convenient for storing and reading data files, and furthermore, the linear index of the invention uses 4-byte address offset instead of the usual 8-byte pointer to represent the position, thereby saving storage space and reducing the storage of invalid data and the access of invalid memories.
For example, FIG. 3 shows a schematic diagram of a file system structure, according to one embodiment of the invention. As shown in fig. 3, two directories d1 and d2 are included under the root directory. Directory d1 further includes file abc, file aef, and directory d3, directory d3 being an empty directory; directory d2 further includes file abef. In the embodiment shown in fig. 3, the level of the root directory is denoted as the 0 th level of the tree file system, the levels of the directories d1 and d2 are denoted as the 1 st level, and the levels of the files abc, aef, abef and the directory d3 are denoted as the 2 nd level.
Traversing the file system shown in fig. 3 layer by layer generates a linear index as shown in fig. 4. The linear index sequentially includes a header (also called guide information), names and index information of data objects of layer 1, and names and index information of data objects of layer 2.
The index information of the predefined data object includes four information items, namely a start tag, an end tag, a type identifier of the data object and an end identifier, and the specific meaning of each information item is as described above. It is defined that when a directory is an empty directory, a preset special identifier NULL is stored in a start tag of the directory. The type of the data object comprises a file type and a directory type, the type identifier of the file type data object is defined as a character't', and the type identifier of the directory type data object is default, namely, the directory type data object is not identified. In addition, the end identifier is defined as the character '0'.
Specifically, when building a linear index, the header is written first, and then the data objects of the first layer of the file system are traversed. The first data object in the first layer is directory d1, writing the name of directory d1 (i.e., string "d 1") into the linear index. Since d1 is a directory-type data object and is not the last data object in its parent directory (i.e., root directory), the index information for directory d1 includes only the start tag s 1. The start tag s1 stores the address offset in the linear index of the first child data object of directory d1, i.e., file abc. Since the file abc is not traversed at present, the address offset of the file abc is unknown, a four-byte storage space may be reserved for the start tag s1, and when the file abc is traversed subsequently, the address offset of the file abc is written into the start tag s 1.
Subsequently, the name of the second data object of the first layer, directory d2 (i.e., string "d 2") is written to the linear index. Since d2 is a directory-type data object and is the last data object under its parent directory (i.e., root directory), the index information of directory d2 includes a start tag s2, an end identifier '0', and an end tag e 1. Therein, the start tag s2 stores the address offset of the first child data object of directory d2, i.e., file abef, in the linear index. Since the file abef is not traversed at present, the offset of the file abef is unknown, a four-byte storage space may be reserved for the start tag s2, and when the file abef is traversed subsequently, the address offset of the file abef is written into the start tag s 2. The end tag e1 stores the address offset of its parent directory, i.e., root directory. Since the root directory is the topmost file system, the address offset may not be specially labeled, and a default value may be stored in the end tag e 1.
Subsequently, data objects of a second tier of the file system are traversed. The first data object of the second layer is a file abc, and the name of the file abc (i.e., the string "abc") is written into the linear index. Since abc is a file-type data object and is not the last data object in its parent directory (i.e., directory d1), the index information for file abc only includes type identifier't', which is written to the linear index.
Subsequently, the name of the second data object of the second layer, file aef (i.e., string "aef") is written to the linear index. Since aef is a file-type data object and is not the last data object in its parent directory (i.e., directory d1), the index information for file aef includes only the type identifier't' which is written to the linear index.
Subsequently, the name of the third data object of the second layer, directory d3 (i.e., string "d 3") is written to the linear index. Since directory d3 is a directory-type data object and is the last data object in its parent directory (i.e., directory d1), the index information for directory d3 includes a start tag s3, an end identifier '0', and an end tag e 2. Here, since d3 is an empty directory, a special identifier NULL is written in start tag s 3. The end tag e2 stores the address offset of (the name of) its parent directory, directory d 1.
Subsequently, the name of the fourth data object of the second layer, file abef (i.e., string "abef"), is written to the linear index. Since abef is a file-type data object and is the last data object in its parent directory (i.e., directory d2), the index information of file abef includes a type identifier't', an end identifier '0', and an end tag e 3. The end tag e3 stores the address offset of (the name of) its parent directory, directory d 2.
According to an embodiment, as shown in fig. 5, the index establishing method 200 further includes step S220. In step S220, determining the substrings included in the names of the data objects, respectively; and establishing an inverted index by taking the substring as a key and the name of the data object containing the substring as a value.
For example, table 1 shows substrings included in file abc, file aef, and file abef:
TABLE 1
Data object name Substrings included in data object names
abc a,b,c,ab,bc,abc
aef a,e,f,ae,ef,aef
abef a,b,e,f,ab,be,ef,abe,bef,abef
The inverted index is created with the substrings in table 1 as keys and the names of the data objects containing the substrings as values, as shown in table 2. The inverted index includes a plurality of inverted index chains (one for each row except the header in table 2), each corresponding to one substring. Based on the inverted index, a data object in the name that contains the specified substring can be quickly determined.
TABLE 2
Figure BDA0003117100190000101
Figure BDA0003117100190000111
In the above embodiment, the file abc, the file aef, and the file abef are taken as examples, and the process of creating the inverted index is described. However, it will be appreciated by those skilled in the art that in practice, an inverted index may be built for all data objects in the file system, i.e. the inverted index includes not only the name of the file-type data object but also the name of the directory-type data object.
According to one embodiment, in order to further increase the speed of searching for the data object name, a hash chain table is used to store the inverted index of the data object name. First, a hash value of the substring is calculated. The hash value is a mapping value of the substring calculated according to a certain algorithm, and the algorithm adopted for calculating the hash value is not limited in the invention. And then, storing the inverted index in a hash chain table form according to the hash value, wherein the hash chain table comprises a plurality of chain tables, each chain table comprises at least one substring and the name of the corresponding data object, and the substrings with the same hash value are positioned in the same chain table.
FIG. 6 illustrates a diagram of storing an inverted index using a hash chain according to one embodiment of the invention. As shown in fig. 6, the hash chain includes two parts, a hash table and a chain. The hash table is an array, each element in the array stores a pointer, the pointer points to a linked list, and the linked list comprises one or more substrings with the same hash value and the names of the corresponding data objects. For example, hash values of the respective substrings are calculated for the inverted indexes shown in table 2, respectively. Calculating to obtain that the hash values of the substrings a, f and ef are the same, wherein the linked list corresponding to the hash values is linked list 2, and the substrings a, f and ef and the names of the data objects corresponding to the substrings a, f and ef are stored in the linked list 2. The hash values of the substrings b, ab, be, abe and abef are the same, the linked list corresponding to the hash values is linked list 3, and the substrings b, ab, be, abe and abef and the names of the corresponding data objects are stored in the linked list 3. Hash values of the substrings c, bc and abc are the same, a linked list corresponding to the hash values is a linked list 4, and names of the substrings c, bc and abc and corresponding data objects are stored in the linked list 4. The hash values of the substrings e, ae, aef and bef are the same, the linked list corresponding to the hash values is a linked list 7, and the names of the substrings e, ae, aef and bef and the corresponding data objects are stored in the linked list 7.
According to one embodiment, the index building method 200 further includes the step of updating the index, namely: and when the updating information of the file system is acquired through the specific interface, updating the index in real time according to the updating information of the file system. Specifically, the kernel module monitors operations such as creation, deletion, and movement of a directory and a file of a Virtual File System (VFS) in the Linux kernel through a kprobes kernel interface, and records corresponding operations so that the user mode module can call the operations to update the index information. The process is as follows:
1. loading a kernel module into a kernel of a Linux operating system, hooking kernel functions of a virtual file system such as vfs _ create and the like through kprobes, and creating/proc/vfs _ changes virtual files to provide an access interface of a user mode module;
2. an application program calls a new file, a mobile file or a deleted file through a system;
3. the kernel module obtains the calling parameters and calling results of the kernel function of the corresponding virtual file system through a kprobes technology;
4. when the corresponding kernel function is successfully called, the kernel module records the change of the corresponding file system and stores the change in an internal cache region;
5. the user mode module regularly obtains the update of the file system through the access to the/proc/vfs _ changes virtual file;
6. and the user mode module updates the internal file system linear index according to the acquired updating information.
FIG. 7 is a schematic structural diagram illustrating a kernel module hooking a virtual file system to obtain a file system update. The name of the file system is anyturing. When a user needs to inquire, the user mode module conducts traversal comparison on the linear index of the whole file system, additional file system recursive traversal is not needed, and a large number of index files are not needed to be stored and combined, so that the inquiry speed can be guaranteed, and the latest search result can be obtained in real time.
By performing the index building method 200, a linear index of the file system can be built. Based on the linear index, efficient data query can be achieved.
FIG. 8 shows a flow diagram of a data query method 800 according to one embodiment of the invention. Method 800 is performed in a computing device (e.g., computing device 100 described above) for efficient data querying based on a linear index of an established file system. As shown in fig. 8, the method 800 begins at step S810.
In step S810, a query term is acquired.
Subsequently, in step S820, the target data object whose name includes the query word is looked up in the linear index.
According to one embodiment, when the file system is provided with an inverted index of data object names, a target data object whose name includes a query term may be determined from the inverted index. Specifically, the inverted index is keyed by a substring and takes as a value the name of the data object containing the substring. And with the query word as a target substring, searching and determining a target data object corresponding to the target substring according to the inverted index.
According to one embodiment, the inverted index is stored in the form of a hash chain table, the hash chain table includes a plurality of chain tables, the chain table includes at least one substring and a name of a data object corresponding to the substring, and the substrings having the same hash value are located in the same chain table. Based on the hash chain table, the target data object may be determined as follows: firstly, calculating a hash value of a query word, and determining a target linked list according to the hash value; and then, searching and determining a target data object corresponding to the target sub-character string in the target linked list by taking the query word as the target sub-character string.
Subsequently, in step S830, the target directory to which the target data object belongs is determined according to the first end tag located after the target data object in the linear index.
Subsequently, in step S840, the storage path of the target data object is determined according to the target directory, and the storage path is returned to the user.
According to one embodiment, the user may specify a search category while entering the query term. Accordingly, the method 800 further comprises the steps of: acquiring an appointed directory; determining the subdata objects included in the specified directory according to the start tag of the specified directory; target data objects having names including specified query terms are looked up in the child data objects.
The data lookup process of the present invention is described below by taking the linear index shown in fig. 4 and the hash chain table shown in fig. 6 as examples.
First, a query term input by a user is acquired. For example, the query word input by the user is the character string "ab".
The target data object with the name "ab" is then looked up in the linear index. The method specifically comprises the following steps 1 and 2:
1. determining a target data object having a name "ab": and calculating the hash value of the query word 'ab', and determining the target linked list where the query word 'ab' is located according to the hash value. For example, after calculation, the linked list corresponding to the hash value of "ab" is the linked list 3 in fig. 6, a substring of "ab" is found in the linked list 3, and the inverted index chain is obtained, that is, the row numbered 6 in table 2. From the inverted index chain, the target data objects whose names include "ab" are determined to be abc and abef.
2. The target data objects abc and abef are looked up in the linear index.
Subsequently, the target data objects abc and abef are found in a linear index as shown in fig. 4.
When searching for the target data object abc, the character string "abc" is compared with the names of the data objects stored in the linear index, so as to find out the target data object abc in the linear index. Subsequently, the first end tag e2, located after abc, in the linear index is determined, and the address offset in the linear index of the parent directory to which the target data object abc belongs, i.e., directory d1, is stored in end tag e2, so that the target directory d1 to which the target data object abc belongs can be determined from end tag e 2. Subsequently, according to the first end tag e1 located after the target directory d1, since the default value is stored in the end tag e1, the parent directory of the directory d1 is determined to be the root directory. And reversely arranging the target data object abc, the directory d1 and the root directory which are sequentially determined in the linear index searching process, and determining that the storage path of the target data object is/d 1/abc.
When searching for the target data object abef, the character string "abef" is compared with the names of the data objects stored in the linear index, so as to find out the target data object abef in the linear index. Subsequently, the first end tag e3, located after abef, in the linear index is determined, the address offset in the linear index of the parent directory to which the target data object abef belongs, i.e. directory d2, stored in end tag e3, from which the target directory d2 to which the target data object abef belongs can be determined from end tag e 3. Subsequently, according to the first end tag e1 located after the target directory d2, since the default value is stored in the end tag e1, the parent directory of the directory d2 is determined to be the root directory. And reversely arranging the target data object abef, the directory d2 and the root directory which are sequentially determined in the linear index searching process, and determining that the storage path of the target data object is/d 2/abef.
And finally, returning the storage paths/d 1/abc,/d 2/abef of the searched target data objects abc and abef to the user.
According to the technical scheme, the kernel module is used for hooking the kernel function related to the Linux virtual file system, and the linear cache area index of the user mode module is combined to realize an efficient file name searching function, so that the speed of file name searching can be greatly improved, and the searching instantaneity and the miniaturization of the index file are kept.
A11: the method of a10, wherein the file system is provided with an inverted index that uses a substring as a key and uses a name of a data object containing the substring as a value, and the step of searching for a target data object whose name includes the query term in the linear index includes:
and searching and determining a target data object corresponding to the target substring according to the inverted index by taking the query word as the target substring.
A12: the method of a11, wherein the inverted index is stored in a form of a hash chain table, the hash chain table includes a plurality of chain tables, the chain table includes at least one substring and a name of a data object corresponding to the substring, and the substrings with the same hash value are located in the same chain table;
the step of finding a target data object whose name includes the query term in the linear index includes:
calculating a hash value of the query word, and determining a target linked list according to the hash value;
and searching and determining a target data object corresponding to the target substring in the target linked list by taking the query word as the target substring.
A13: the method described in A10-12, further comprising:
acquiring an appointed directory;
determining the subdata objects included in the specified directory according to the start tag of the specified directory;
and searching a target data object with a name comprising a specified query word in the sub data objects.
A14: a computing device, comprising:
at least one processor; and
a memory storing program instructions;
the program instructions, when read and executed by the processor, cause the computing device to perform the index building method of any of A1-9 or the data query method of any of A10-13.
A15: a readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the index building method of any of a1-9 or the data query method of any of a 10-13.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the index building method or the data query method of the present invention according to instructions in the program code stored in the memory.
By way of example, and not limitation, readable media may comprise readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of this invention. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense with respect to the scope of the invention, as defined in the appended claims.

Claims (10)

1. An index building method, executed in a computing device, adapted to build an index for a file system, the file system comprising a plurality of data objects, the plurality of data objects comprising directories and files, the method comprising:
establishing a linear index of the file system according to the hierarchical relationship of the plurality of data objects, the linear index comprising names and index information of the data objects of the respective hierarchies stored sequentially, wherein,
the index information of any directory comprises a start tag and a type identifier of a data object, wherein the start tag stores the position information of a first subdata object in the directory in the linear index;
the index information of the last child data object under any directory includes an end tag and a type identifier of the data object, and the end tag stores the position information of the directory in the linear index.
2. The method of claim 1, wherein the index information of the last child data object under each directory includes an end identifier for indicating an end of contents of the directory.
3. The method of claim 1, wherein the data object further comprises a link.
4. The method according to any one of claims 1 to 3, wherein when the any directory is an empty directory, a preset special identifier is stored in a start tag of the any directory.
5. The method of any of claims 1-4, wherein the location information is an address offset of a data object in the linear index.
6. The method of any one of claims 1-5, wherein the method further comprises:
respectively determining sub-character strings included by the names of the data objects;
and establishing an inverted index by taking the substring as a key and the name of the data object containing the substring as a value.
7. The method of claim 6, wherein the method further comprises:
calculating a hash value of the substring;
and storing the inverted index in a hash chain table form according to the hash value, wherein the hash chain table comprises a plurality of chain tables, the chain table comprises at least one substring and the name of the corresponding data object, and the substrings with the same hash value are positioned in the same chain table.
8. The method of claim 1, wherein the method further comprises:
and when the update information of the file system is acquired through a specific interface, updating the linear index in real time according to the update information of the file system.
9. A data query method, executed in a computing device, the method being adapted to find data objects comprising specified query terms in names in a file system, the file system building a linear index according to the method of any one of claims 1-8, the method comprising:
acquiring a query word;
searching a target data object with a name comprising the query word in the linear index;
determining a target directory to which the target data object belongs according to a first end tag located after the target data object in the linear index;
and determining a storage path of the target data object according to the target directory, and returning the storage path to the user.
10. The method of claim 9, wherein the file system is provided with an inverted index keyed by a substring and valued by a name of a data object containing the substring, and the step of looking up a target data object whose name includes the query word in the linear index comprises:
and searching and determining a target data object corresponding to the target substring according to the inverted index by taking the query word as the target substring.
CN202110670882.2A 2019-06-11 2019-06-11 Index establishing method, data query method and computing device Pending CN113297138A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110670882.2A CN113297138A (en) 2019-06-11 2019-06-11 Index establishing method, data query method and computing device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910500505.7A CN110275864B (en) 2019-06-11 2019-06-11 Index establishing method, data query method and computing device
CN202110670882.2A CN113297138A (en) 2019-06-11 2019-06-11 Index establishing method, data query method and computing device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201910500505.7A Division CN110275864B (en) 2019-06-11 2019-06-11 Index establishing method, data query method and computing device

Publications (1)

Publication Number Publication Date
CN113297138A true CN113297138A (en) 2021-08-24

Family

ID=67960584

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110670882.2A Pending CN113297138A (en) 2019-06-11 2019-06-11 Index establishing method, data query method and computing device
CN201910500505.7A Active CN110275864B (en) 2019-06-11 2019-06-11 Index establishing method, data query method and computing device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201910500505.7A Active CN110275864B (en) 2019-06-11 2019-06-11 Index establishing method, data query method and computing device

Country Status (1)

Country Link
CN (2) CN113297138A (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579526A (en) * 2019-09-30 2021-03-30 陕西西部资信股份有限公司 Data processing method, system and device
CN112925671A (en) * 2019-12-06 2021-06-08 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for managing application systems
CN113626380B (en) * 2020-05-06 2024-04-05 浙江宇视科技有限公司 Directory structure adjustment method, device, equipment and storage medium
CN111739190B (en) * 2020-05-27 2022-09-20 深圳市元征科技股份有限公司 Vehicle diagnostic file encryption method, device, equipment and storage medium
CN111949619B (en) * 2020-07-21 2024-04-26 苏州元核云技术有限公司 Dynamic catalog generation method, system, electronic equipment and storage medium
CN114490917A (en) * 2020-11-11 2022-05-13 北京神州泰岳软件股份有限公司 Method and device for realizing full-text retrieval function and electronic equipment
CN112632069B (en) * 2020-12-22 2021-08-31 中科驭数(北京)科技有限公司 Hash table data storage management method, device, medium and electronic equipment
CN112597114B (en) * 2020-12-23 2023-09-15 跬云(上海)信息科技有限公司 OLAP (on-line analytical processing) precomputation engine optimization method and application based on object storage
CN113157209A (en) * 2021-04-09 2021-07-23 北京易华录信息技术股份有限公司 Data reconstruction method and device from file system to object storage
CN114139021B (en) * 2022-01-27 2022-06-14 云丁网络技术(北京)有限公司 Index information management method and system
CN115238257B (en) * 2022-09-26 2023-01-06 深圳市亲邻科技有限公司 Access control face permission updating method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101610319A (en) * 2008-06-17 2009-12-23 大唐移动通信设备有限公司 Recording of information, statistical method and recording of information, statistic device
CN102063446A (en) * 2009-11-13 2011-05-18 中国移动通信集团四川有限公司 Method for creating inverted index and inverted indexing device
CN103902699A (en) * 2014-03-31 2014-07-02 哈尔滨工程大学 Data space retrieval method applied to big data environments and supporting multi-format feature
CN104714966A (en) * 2013-12-14 2015-06-17 中国航空工业集团公司第六三一研究所 Storage method and fast search method of avionics network management information
CN105701096A (en) * 2014-11-25 2016-06-22 腾讯科技(深圳)有限公司 Index generation method, data inquiry method, index generation device, data inquiry device and system
CN108183821A (en) * 2017-12-26 2018-06-19 国网山东省电力公司信息通信公司 A kind of application performance acquisition methods and device towards electrical network business

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020078B (en) * 2011-09-24 2017-11-03 国家电网公司 Distributing real-time data bank data hierarchy indexing means
CN104021223B (en) * 2014-06-25 2017-07-25 国家电网公司 The access method and device of a kind of Cluster Database measuring point

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101610319A (en) * 2008-06-17 2009-12-23 大唐移动通信设备有限公司 Recording of information, statistical method and recording of information, statistic device
CN102063446A (en) * 2009-11-13 2011-05-18 中国移动通信集团四川有限公司 Method for creating inverted index and inverted indexing device
CN104714966A (en) * 2013-12-14 2015-06-17 中国航空工业集团公司第六三一研究所 Storage method and fast search method of avionics network management information
CN103902699A (en) * 2014-03-31 2014-07-02 哈尔滨工程大学 Data space retrieval method applied to big data environments and supporting multi-format feature
CN105701096A (en) * 2014-11-25 2016-06-22 腾讯科技(深圳)有限公司 Index generation method, data inquiry method, index generation device, data inquiry device and system
CN108183821A (en) * 2017-12-26 2018-06-19 国网山东省电力公司信息通信公司 A kind of application performance acquisition methods and device towards electrical network business

Also Published As

Publication number Publication date
CN110275864A (en) 2019-09-24
CN110275864B (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN110275864B (en) Index establishing method, data query method and computing device
US10754878B2 (en) Distributed consistent database implementation within an object store
US11003625B2 (en) Method and apparatus for operating on file
CN109446362B (en) Graph database structure based on external memory, graph data storage method and device
US10678654B2 (en) Systems and methods for data backup using data binning and deduplication
US20190205480A1 (en) Layered graph data structure
US20130268770A1 (en) Cryptographic hash database
US9535925B2 (en) File link migration
US11907251B2 (en) Method and system for implementing distributed lobs
CN112328548A (en) File retrieval method and computing device
US20090319478A1 (en) Method for improving the performance of a file system in a computing device
CN109460406B (en) Data processing method and device
US10606805B2 (en) Object-level image query and retrieval
US20200019539A1 (en) Efficient and light-weight indexing for massive blob/objects
CN110709824B (en) Data query method and device
CN111831659B (en) Index checking method and device and computing equipment
US11132401B1 (en) Distributed hash table based logging service
CN113535650A (en) File naming method and computing device
US11030151B2 (en) Constructing an inverted index
CN110321325B (en) File index node searching method, terminal, server, system and storage medium
CN117540056A (en) Method, device, computer equipment and storage medium for data query
Alhisnawi Filter Based Forwarding Information Base Design for Content Centric Networking
CN116561023A (en) Memory quick search method, device, computer equipment and storage medium
CN114564449A (en) Data query method, device, equipment and storage medium
WO2023165691A1 (en) Method of updating key/value pair in object storage system and object storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination