CN107798073B - Method and device for processing data set with tree structure - Google Patents

Method and device for processing data set with tree structure Download PDF

Info

Publication number
CN107798073B
CN107798073B CN201710903238.9A CN201710903238A CN107798073B CN 107798073 B CN107798073 B CN 107798073B CN 201710903238 A CN201710903238 A CN 201710903238A CN 107798073 B CN107798073 B CN 107798073B
Authority
CN
China
Prior art keywords
directory
stack
directories
stacks
scanning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710903238.9A
Other languages
Chinese (zh)
Other versions
CN107798073A (en
Inventor
马彦强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201710903238.9A priority Critical patent/CN107798073B/en
Publication of CN107798073A publication Critical patent/CN107798073A/en
Application granted granted Critical
Publication of CN107798073B publication Critical patent/CN107798073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Abstract

The application provides a method for processing a data set with a tree structure, which comprises the following steps: determining at least two stacks according to a tree structure of a data set, wherein data in the data set is stored in the at least two stacks, and the intersection of nodes of any two stacks in the at least two stacks is an empty set; the at least two stacks are scanned and data in the at least two stacks is processed. According to the method provided by the embodiment, the processor firstly determines at least two stacks from the data set before traversing the data set, so that the processor can scan a plurality of stacks simultaneously, and the efficiency of traversing the data set is improved.

Description

Method and device for processing data set with tree structure
Technical Field
The present application relates to the field of computers, and in particular, to a method and an apparatus for processing a data set having a tree structure.
Background
The tree structure is an associative relationship between data, that is, there is a "one-to-one" or "one-to-many" correspondence relationship between a plurality of data. For example, the file system directory standard (FHS) is a file organization form with a tree structure, where a root directory is a source of all directories and files, the root directory includes a number of subdirectories and/or files, and the subdirectories may also include next level subdirectories and/or files, and all or part of the root directory, the subdirectories, and the files constitute a data set with a tree structure.
Data in a data set has the properties of a layer (layer) and a stack (stack), where a stack may also be referred to as a stack. For example, all subdirectories of the root directory (the next level directory of the root directory) are located at the same level of the tree structure, and all grandchild directories of the root directory (the next level directory of the root directory) are located at another level of the tree structure; and linear scanning is carried out from the root directory downwards layer by layer until the directory which does not contain the subdirectories is scanned, so that a directory chain is obtained, the directory chain is called a stack, and the directory which does not contain the subdirectories is called a stack top.
When searching for required data (i.e., target data) from a data set having a tree structure, it is necessary to perform traversal scanning on all directories and files from a root directory, and a scanning method in the prior art includes a depth-first traversal algorithm and a breadth-first traversal algorithm, where both the two algorithms scan layer by layer from the root directory to traverse the data set, except that after one directory is scanned, the depth-first traversal algorithm preferentially scans the next layer of directory, and the breadth-first traversal algorithm preferentially scans the same layer of data.
In the scenario of traversing data sets with small data volumes, the performance of the two algorithms is acceptable, however, for some data sets with large data volumes, the efficiency of traversing the data sets using the two algorithms is far from meeting the requirement, for example, for some data sets comprising tens of millions of directories and files, the traversing of the data sets using the two algorithms even fails due to the memory exhaustion.
Disclosure of Invention
The application provides a method and a device for processing a data set with a tree structure, which can improve the efficiency of traversing the data set with the tree structure.
In a first aspect, a method for processing a data set having a tree structure is provided, including: generating at least two stacks according to the tree structure of the data set, wherein the intersection of the nodes of any two stacks in the at least two stacks is an empty set; the at least two stacks are scanned and data in the at least two stacks is processed.
According to the method provided by the embodiment, the processor generates at least two stacks according to the tree structure of the data set in the process of traversing the data set, so that the processor can scan the data set in parallel, and the efficiency of traversing the data set is improved. The processor can also delete the scanned stacks in the memory, reduce the memory occupation amount and avoid the condition of failed data set traversal caused by memory exhaustion.
Optionally, the data set includes a plurality of directories, and the generating at least two stacks according to the tree structure of the data set includes: at least two directories of the plurality of directories are used as at least two stack bottom directories to generate at least two stacks.
The at least two directories may be any directory in the plurality of directories, or may be at least two directories selected according to a predetermined rule, and the specific manner of selecting the bottom-of-stack directory is not limited in the present application.
Optionally, the generating at least two stacks by using at least two directories in the plurality of directories as at least two stack bottom directories includes: determining a first top-of-stack directory from the plurality of directories, wherein the first top-of-stack directory is a directory which is first not scanned to a subdirectory in the scanning process of the plurality of directories; determining a first stack according to a first stack top directory and root directories of the directories, wherein the root directory is a stack bottom directory of the first stack, the first stack top directory is a stack top directory of the first stack, and the first stack belongs to the at least two stacks; and generating a second stack according to the directory except the directory included by the first stack, wherein the second stack belongs to the at least two stacks, and the stack bottom directory of the second stack is the directory except the directory included by the first stack.
For example, all directories except the directory included in the first stack may be used as the second stack, and the second stack may be one stack or a plurality of stacks. According to the method provided by the embodiment, the processor can scan the data sets in parallel, and the efficiency of traversing the data sets is improved. The processor can also delete the scanned stacks in the memory, reduce the memory occupation amount and avoid the condition of failed data set traversal caused by memory exhaustion.
Optionally, the generating the second stack according to a directory other than the directory included in the first stack includes: determining a scanning layer of the first stack according to the first stack top directory, wherein the scanning layer is a layer needing to be scanned preferentially in the first stack, the scanning layer is a layer where the non-stack-bottom directory is located, and the scanning layer and the first stack top directory have a preset corresponding relation; and generating a second stack according to M, N and the directories included in the directories in the scanning layer, wherein M is the number of directories capable of being scanned simultaneously in the scanning layer, N is the number of stacks in which the data set can exist simultaneously in the scanning process, the second stack belongs to the at least two stacks, the directories included in the directories in the scanning layer do not belong to the first stack, and M and N are positive integers greater than or equal to 2.
The M and N may be set according to an actual situation of the apparatus implementing the embodiment, for example, when the hardware configuration of the apparatus is higher or the number of tasks to be processed of the apparatus is less, the M and N may be set to larger values, so that the efficiency of traversing the data set and the utilization rate of the apparatus performance may be improved; when the hardware configuration of the device is low or the device has more tasks to be processed, M and N can be set to be smaller values, so that the overload operation of the device caused by the implementation of the embodiment can be avoided.
Optionally, the generating a second stack according to M, N and the directory included in the directory in the scanning layer includes: scanning the first directory; and generating a second stack by taking the subdirectories of the first directory as the bottom directory.
When the scanning layer comprises a plurality of directories, the sub-directories of the directories can be respectively used as the stack bottoms of the sub-stacks to generate the sub-stacks, so that the processor can scan the stacks simultaneously, and the efficiency of traversing the data set is improved.
Optionally, the scanning layer is a layer where a parent directory of the first stack top directory is located.
The closer the scanning layer is to the root directory, the higher the efficiency of traversing the data set is, however, the closer the scanning layer is to the root directory, the more directories need to be processed simultaneously, the more difficult the traversal algorithm is to be implemented, and the layer where the parent directory of the first stack top directory is located is taken as the scanning layer, so that the traversal efficiency is improved, and the difficulty of implementing the scheme is reduced.
Optionally, the scanning the at least two stacks includes: the at least two stacks are scanned simultaneously by at least two scan threads.
According to the method provided by the embodiment, when the hardware configuration of the device implementing the embodiment is higher or the number of tasks to be processed of the device is less, the device can scan a plurality of stacks in parallel through multiple threads, so that the efficiency of traversing data sets and the utilization rate of the performance of the device are improved.
Optionally, the method further includes: and when any one of the at least two stacks is completely scanned, exiting the stack which is completely scanned.
The stack which is scanned is quitted, namely the stack stored in the memory is deleted, so that the memory occupation amount can be reduced, and the probability of data set traversal failure caused by memory exhaustion is reduced.
In a second aspect, an apparatus for processing a data set having a tree structure is provided, where the apparatus can implement the functions of the execution device of the method according to the first aspect, and the functions can be implemented by hardware or by hardware executing corresponding software. The hardware or software includes one or more units or modules corresponding to the above functions.
In one possible design, the apparatus includes a processor and a memory, and the processor is configured to support the apparatus to execute the method according to the first aspect to achieve the corresponding functions. The apparatus may further comprise a memory for coupling with the processor, which holds program instructions and data necessary for the apparatus, e.g. the memory for storing the above-mentioned data sets having the tree structure.
In a third aspect, a computer-readable storage medium is provided, having stored therein computer program code, which, when executed by a processing unit or processor, causes the processing unit or processor to perform the method of the first aspect.
In a fourth aspect, there is provided a chip having stored therein instructions that, when run on an electronic device, cause the electronic device to perform the method of the first aspect described above.
Drawings
FIG. 1 is a schematic diagram of a data set having a tree structure suitable for use with the present application;
FIG. 2 is a schematic diagram of a method of processing a data set having a tree structure provided herein;
FIG. 3 is a schematic diagram of another data set having a tree structure suitable for use with the present application;
FIG. 4 is a flow chart of a method of processing a data set having a tree structure provided herein;
FIG. 5 is a schematic diagram of one possible processor provided herein;
FIG. 6 is a schematic diagram of another possible processor provided herein.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a data set having a tree structure to which the present application is applied. The circles in fig. 1 represent a directory of a tree structure, each of which may store one or more data or may be an empty directory.
Directory 1 is a root directory and a tree (i.e., a tree structure) has only one root directory. The directory next to the directory is called a subdirectory, and the directory next to the directory is called a grandchild directory, for example, directory 2 is a subdirectory of directory 1, and directory 4 is a grandchild directory of directory 1. Accordingly, directory 2 is a parent directory of directory 4, and directory 1 is a grandparent directory of directory 4.
Directories with common parent directories are siblings of each other, e.g., directory 2 is a sibling of directory 3 and directories 5 and 6 are siblings of directory 4.
Directories without subdirectories are called leaf directories, and the leaf directories in fig. 1 include: directory 9, directory 10, directory 13, directory 15, directory 18, directory 19, directory 22, and directory 23.
The directory also has a layer attribute, and the number of layers can be determined by the distance between the directory and the root directory, for example, it can be defined that the layer where the directory 1 is located is the first layer, the layers where the directories 2 and 3 are located are the second layer, and the layers where the directories 4, 5, 6, 7 and 8 are located are the third layer. The number of levels included in a tree, i.e., the depth of the tree, is shown to be 8 in fig. 1.
Fig. 2 is a method for processing a data set having a tree structure provided in the present application. The method 200 comprises:
s201, generating at least two stacks according to the tree structure of the data set, wherein the intersection of the nodes of any two stacks in the at least two stacks is an empty set.
S202, scanning the at least two stacks and processing data in the at least two stacks.
The apparatus for performing the method 200 is, for example, a processor.
In S201, the processor may determine at least two stacks from the tree structure according to different methods, as shown in fig. 1, all directories below directory 1, directory 2, and directory 2 may be used as one stack, and all directories below directory 3 and directory 3 may be used as another stack. The above-mentioned determination of at least two stacks means: the processor scans the data sets stored in the non-volatile memory and reads the scanned directories into the memory to generate at least two stacks when the directories are scanned.
As another alternative example, the processor may select a directory at each level other than the root directory as the bottom of a stack, and still take FIG. 1 as an example, select directory 3 as the bottom of a stack at the second level, directory 5 as the bottom of a stack at the third level, and directory 14 as the bottom of a stack at the fourth level. The principle of selecting the bottom of the stack may be to use the directory with the smallest value of all the entries in the directories of each layer as the bottom of the stack.
The above embodiments are merely examples, and the method for processing a data set having a tree structure provided by the present application is not limited thereto, and any embodiment that determines at least two stacks according to a tree structure falls within the scope of the present application.
It should be noted that, because the stack has a first-in last-out characteristic, that is, the first-in stack data needs to wait for the second-in stack data to be popped out, the stack is determined when the stack bottom is determined. Of course, the size of the stack also needs to be determined from the top of the stack.
Further, the directories in the stack have the attributes of nodes, one node may include one or more directories, and if the tree shown in fig. 1 is regarded as one stack, directory 1 belongs to the root node, directories 2 and 3 belong to nodes different from the root node, and directory 4, directory 5, directory 6, directory 7, and directory 8 belong to another node. If directory 1, directory 2, directory 3, directory 7, directory 8, directory 11 and directory 12 are considered as a stack, directory 1 belongs to the root node, directory 2 and directory 3 belong to a different node from the root node, directory 7 and directory 8 belong to another node, and directory 11 and directory 12 belong to another node than the above-mentioned node.
In the application, the intersection of the nodes of the two stacks generated according to the tree structure is an empty set, that is, there is no directory belonging to the two stacks at the same time, so that the scan efficiency reduction caused by the directory being scanned repeatedly is avoided.
In S202, the processor may select to scan at least two stacks simultaneously or scan at least two stacks sequentially according to actual situations.
For example, if the processor currently has more tasks to process, the processor may choose to scan one or two stacks before scanning the other stack, thereby avoiding the processor from running under an overload condition.
For another example, if the number of tasks currently to be processed by the processor is small, the processor may scan the at least two stacks at the same time, so as to improve the processing efficiency of the tree structure and improve the utilization rate of the processing capability of the processor.
Optionally, the data set includes a plurality of directories, and the determining at least two stacks according to the tree structure of the data set includes:
s2011, a first top directory is determined from the plurality of directories, where the first top directory is a directory that is first of the plurality of directories that is not scanned to a child directory in the scanning process.
S2012, determining a first stack according to the first stack top directory and the root directories of the plurality of directories, where the root directory is a stack bottom directory of the first stack, the first stack top directory is a stack top directory of the first stack, and the first stack belongs to the at least two stacks.
S2013, a second stack is generated according to the directories, except the directory included in the first stack, and the second stack belongs to the at least two stacks.
The root directory may be used as the bottom of the stack to generate the first stack, but the size of the stack cannot be determined only by the bottom of the stack, that is, if there is no top of the stack, the whole tree structure belongs to the first stack, and therefore, the top of the first stack (i.e., the first top directory) needs to be found so as to limit the size of the first stack. An example of determining the first top of stack directory is given below.
The processor searches for the first top directory according to the depth-first principle, as shown in fig. 1, the processor may scan in the order of directory 1, directory 2, directory 4 to directory 9, when a subdirectory is scanned, the subdirectory scan is entered, and before the top of the stack is found, the processor does not scan other directories at the same layer as directory 2 and directory 4, so as to quickly find the top of the stack. If the processor does not scan a subdirectory when scanning directory 9, the processor may treat directory 9 as the first top of stack directory. It should be noted that the above "the processor does not scan the subdirectory when scanning the directory 9" does not mean that there is no subdirectory in the directory 9, and one possible reason is that the processor first scans a file when scanning the directory 9, and then the processor determines that there is no subdirectory in the directory 9, thereby determining that the directory 9 is the first top directory.
After the processor determines the first stack, the second stack may be generated by using all directories except the directory included in the first stack as the directory of the second stack, where the second stack may be one stack or a plurality of stacks.
Optionally, generating the second stack from a directory other than the directory included in the first stack comprises:
s20131, determining a scanning layer of the first stack according to the first stack top directory, wherein the scanning layer is a layer needing to be scanned preferentially in the first stack, the scanning layer is a layer where the non-stack-bottom directory is located, and the scanning layer and the first stack top directory have a preset corresponding relation.
S20132, a second stack is generated according to M, N and the directories included in the directories in the scanning layer, M is the number of directories that can be scanned simultaneously by the scanning layer, N is the number of stacks in which a data set can exist simultaneously in the scanning process, the second stack belongs to the at least two stacks in S201, the directories included in the directories in the scanning layer do not belong to the first stack, and M and N are positive integers greater than or equal to 2.
For S20131, the scanning layer may be a layer where the first stack-top directory is located, a layer where a parent directory of the first stack-top directory is located, a layer where a grandparent directory of the first stack-top directory is located, or another layer of the tree structure.
It should be noted that the closer the scanning layer is to the root directory, the higher the efficiency of traversing the data set is, however, the closer the scanning layer is to the root directory, the more difficult the traversal algorithm is to implement, and taking the layer where the parent directory of the first stack-top directory is located as the scanning layer can improve the traversal efficiency and reduce the difficulty of implementing the scheme.
In S20132, M and N may be set according to an actual situation of the apparatus implementing this embodiment, for example, the apparatus is a computer, and when a memory of the computer is large and a processor performance is strong, or when tasks to be processed of the computer are few, M and N may be set to large values, so that efficiency of traversing a data set and a utilization rate of the apparatus performance may be improved. When the memory of the computer is small and the performance of the processor is weak, or when the tasks to be processed of the computer are more, the M and the N can be set to be small values, so that the computer can be prevented from running in an overload mode due to the implementation of the embodiment.
Furthermore, M only represents the number of directories that can be simultaneously scanned by the scanning layer, N only represents the number of stacks that the data set can simultaneously exist during the scanning process, that is, the tree structure simultaneously scans M directories in the scanning layer at most during the scanning process, and the tree structure simultaneously exists N stacks at most during the scanning process. The number of directories that the tree structure actually scans at the same time in the scanning process may be less than M, and the number of stacks that the tree structure actually exists at the same time in the scanning process may be less than N.
In S20132, "directory included in directory in scan layer" refers to all directories included in directory in scan layer, and taking fig. 1 as an example, assuming that directory 6 is a directory in scan layer, directories included in directory 6 refer to directories 14 to 23.
Optionally, the generating a second stack according to M and N and the directory included in the directory in the scanning layer includes:
s201321, scan the first directory.
S201322, create a second stack using the subdirectories of the first directory as bottom directories.
When the scanning layer comprises a plurality of directories, the subdirectories of the directories can be respectively used as the stack bottoms of a plurality of sub-stacks, so that the processor can scan a plurality of stacks simultaneously, and the efficiency of traversing the data set is improved.
Optionally, the scanning the at least two stacks comprises:
s208, the at least two stacks are scanned simultaneously by at least two scanning threads, for example, the at least two scanning threads and the at least two stacks are in one-to-one correspondence.
According to the method provided by the embodiment, when the hardware configuration of the device implementing the embodiment is higher or the number of tasks to be processed of the device is less, the device can scan a plurality of stacks in parallel through multiple threads, so that the efficiency of traversing data sets and the utilization rate of the performance of the device are improved.
Optionally, the method 200 further comprises:
s209, when any one of the at least two stacks is scanned, exiting the scanned stack.
The stack which is scanned is quitted, namely the stack stored in the memory is deleted, so that the memory occupation amount can be reduced, and the probability of data set traversal failure caused by memory exhaustion is reduced.
The foregoing has generally described the method for processing a data set having a tree structure provided by the present application, and a specific example is shown below in conjunction with fig. 3.
Fig. 3 shows another data set with a tree structure, to which the present application applies, each circle representing a directory, the inside of each directory possibly being: containing sub-directories and files, it is also possible to include only sub-directories and possibly only files. The relationships of the various directories in fig. 1 also apply to the directories shown in fig. 3.
First, M is set to 3, N is set to 4, and the scan layer is set to the layer where the parent directory of the top-of-stack directory is located. That is, the data set shown in FIG. 3 produces a maximum of 4 stacks during the scanning process, and the scanning layer scans a maximum of 3 directories simultaneously.
Second, a top of stack directory is determined. Starting from directory 1, scanning directories according to the depth-first principle until directory 9 is scanned, and since directory 9 only contains one file, directory 9 is the top of stack directory, and directory 9 is the top of stack [1 ].
It is determined that the layer at which the parent directory of directory 9 is located is the scan layer of stack [1], i.e., the layer at which directory 4 is located is the scan layer. It should be noted that, since the parent directories of directory 7 and directory 8 are different from the parent directory of directory 4, directory 7 and directory 8 are not located in the scanning layer of stack [1], that is, the directories included in the scanning layer have the same parent directory. Since M has a value of 3, the processor can scan directory 4, directory 5, and directory 6 simultaneously.
The subdirectory of directory 5 is taken as the bottom of a sub-stack (i.e., stack [2 ]); the subdirectory of directory 6 is taken as the bottom of the stack of another substack, stack [3 ].
Since N has a value of 4, the tree structure shown in fig. 3 can be regenerated into a sub-stack.
According to the method for determining the stack top directory, the stack top directory can be continuously searched in the stack [1], the stack [2] and the stack [3 ]. The only directory 10 remaining in stack [1] has no subdirectories, i.e., the scanned layer cannot be determined from directory 10, so stack [1] cannot regenerate a sub-stack. Similarly, stack [2] has only directory 13 and no subdirectories, nor can stack [2] create a sublist. Since the parent directory of the directory 15 of the stack [3] is at the bottom of the stack, the directory 15 cannot be a top of the stack directory, the directory 18 does not include a child directory, and the parent directory of the directory 18 is not at the bottom of the stack, the directory 18 is a top of the stack directory, the directory 16 is at the scan level, and since M is 3, the directory 15, the directory 16, and the directory 17 can be simultaneously scanned, and since the directory 15 does not have a child directory, a child stack cannot be formed. Directory 17 has subdirectories and thus may form a 4 th sub-stack, with the subdirectories of directory 17 being the bottom of the 4 th sub-stack (i.e., stack [4 ]).
The four stacks can be scanned simultaneously, and any one stack is exited after being scanned, for example, the stack [2] is exited after being scanned, and a new stack is generated. When the scanning of the scanning layer and all the directories below the scanning layer are finished, the scanning of the layer above the scanning layer, namely the layer where the directory 2 is located, is started.
The above embodiments are merely examples, and the method for processing a data set having a tree structure provided by the present application is not limited thereto.
FIG. 4 is a flow chart of a method of processing a data set having a tree structure provided herein. The data set scanned by the scan thread in fig. 4 may be the data set shown in fig. 3.
S401, the scanning thread reads the root directory through readdir. Where readdir is a function of the read directory.
If the scanning thread reads the file, adding the file into a queue to be processed, and processing the file by the processing thread; and if the scanning thread reads the subdirectory, adding the subdirectory into the stack where the root directory is positioned.
S402, the scanning thread reads the directory of the scanning layer through the readdir.
If the scanning thread reads the file, adding the file into the queue to be processed, and processing the file by the processing thread; if the scanning thread reads the directory of the scanning layer, adding the directory into the stack where the root directory is located; if the scan thread reads the cousin directory of the top of the stack (i.e., the child directory of the cousin directory of the parent directory of the top of the stack), a new stack is created with the cousin directory as the bottom of the stack.
S403, the scanning thread scans each stack.
After all the subdirectories of one directory are scanned, the directory is scanned completely, and the directory exits from the stack; when all the directories in one stack are scanned, the stack is scanned completely, the stack exits, the number of the stacks is reduced by 1, and a new stack can be generated.
S404, after all the stacks are quitted, the scanning is finished.
When all the directories in the stack where the root directory is located are scanned completely, the stack where the root directory is located can be considered to be scanned completely, and the scanning of the whole tree-structured data set is completed.
Examples of methods of processing data sets having a tree structure provided herein are described above in detail. It is understood that the device for performing the above method includes hardware structures and/or software modules for performing the respective functions in order to realize the above functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The present application may perform the division of the functional units for the apparatus for processing the data set having the tree structure according to the above method example, for example, each functional unit may be divided for each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the units in the present application is schematic, and is only one division of logic functions, and there may be another division manner in actual implementation.
Fig. 5 shows a possible schematic diagram of the processor involved in the above-described embodiment, in case of an integrated unit. The processor 500 includes: a processing unit 502. The processing unit 502 is used for controlling and managing the operation of the processor 500.
For example, the processing unit 502 is configured to perform: determining at least two stacks according to a tree structure of a data set, wherein data in the data set is stored in the at least two stacks, and an intersection of nodes of any two stacks in the at least two stacks is an empty set. The at least two stacks are scanned and data in the at least two stacks is processed.
The processor 500 may be, for example, a Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The processing unit 502 may be a processor core (core).
In one possible design, processor 500 may further include a storage unit 501, where storage unit 501 is used to cache information that processor 500 needs when executing method 200, and storage unit 501 is, for example, a cache (cache) module.
In one possible design, the processor 500 may further include a communication unit 503, and the communication unit 503 is configured to support communication between the processor 500 and other devices or modules, for example, a Solid State Disk (SSD). The communication unit 503 may be a communication interface.
When the processing unit 502 is a processor core, the communication unit 503 is a communication interface, and the storage unit 501 is a cache module, the processor referred to in this application may be the processor shown in fig. 6.
Referring to fig. 6, the processor 600 includes: a processor core 602, a communication interface 603 and a cache module 601. The communication interface 603, the processor core 602, and the cache module 601 may communicate with each other via internal connection paths, and transmit control and/or data signals.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The processor 500 and the processor 600 provided by the present application first determine at least two stacks from the data set before traversing the data set, so that the processor can scan a plurality of stacks simultaneously, thereby improving the efficiency of traversing the data set.
It is to be understood that the processors in the apparatus and method embodiments correspond exactly, and that the respective steps are performed by respective units, e.g. the processor cores perform the determining steps and the processing steps in the method embodiments. The functions of the specific elements may be referred to corresponding method embodiments and will not be described in detail.
In the embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic of the processes, and should not limit the implementation processes of the present application.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a compact disc read only memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a terminal device. Of course, the processor and the storage medium may reside as discrete components in a terminal device.
The computer instructions may be stored in or transmitted from a website, computer, server, or data center, via a wired (e.g., coaxial cable, fiber optic cable, digital subscriber line (DS L)) or wireless (e.g., infrared, wireless, microwave, etc.) manner to another website, computer, server, or data center.
The above-mentioned embodiments, objects, technical solutions and advantages of the present application are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims (12)

1. A method of processing a data set having a tree structure, the method comprising:
generating at least two stacks according to a tree structure of a data set, wherein the intersection of nodes of any two stacks in the at least two stacks is an empty set;
scanning the at least two stacks and processing data in the at least two stacks;
the data set includes a plurality of directories, and the generating of the at least two stacks according to the tree structure of the data set includes:
generating the at least two stacks using at least two directories in the plurality of directories as at least two stack bottom directories;
the generating the at least two stacks using at least two directories of the plurality of directories as at least two bottom-of-stack directories comprises:
determining a first top-of-stack directory from the plurality of directories, wherein the first top-of-stack directory is a directory which is first not scanned to a subdirectory in the scanning process of the plurality of directories;
generating a first stack according to the first stack top directory and a root directory of the plurality of directories, wherein the root directory is a stack bottom directory of the first stack, the first stack top directory is a stack top directory of the first stack, and the first stack belongs to the at least two stacks;
and generating a second stack according to directories except the directory included by the first stack, wherein the second stack belongs to the at least two stacks, and the stack bottom directory of the second stack is the directory except the directory included by the first stack.
2. The method of claim 1, wherein generating a second stack from a directory other than the directory included in the first stack comprises:
determining a scanning layer of the first stack according to the first stack top directory, wherein the scanning layer is a layer needing to be scanned preferentially in the first stack, the scanning layer is a layer where a non-stack-bottom directory is located, and the scanning layer and the first stack top directory have a preset corresponding relation;
generating the second stack according to M, N and the directories included in the directories in the scanning layer, where M is the number of directories that can be scanned simultaneously in the scanning layer, N is the number of stacks in which the data set can exist simultaneously during being scanned, the second stack belongs to the at least two stacks, the directories included in the directories in the scanning layer do not belong to the first stack, and M and N are positive integers greater than or equal to 2.
3. The method of claim 2, wherein the directory in the scan layer comprises a first directory, and wherein generating the second stack from M, N and the directory comprised by the directory in the scan layer comprises:
scanning the first directory;
and generating the second stack by taking the subdirectory of the first directory as a stack bottom directory.
4. The method of claim 2, wherein the scan level is a level at which a parent directory of the first top-of-stack directory resides.
5. The method of any of claims 1 to 4, wherein said scanning said at least two stacks comprises:
the at least two stacks are scanned simultaneously by at least two scan threads.
6. The method according to any one of claims 1 to 4, further comprising:
and when any one of the at least two stacks is scanned completely, exiting the scanned stack.
7. An apparatus for processing a data set having a tree structure, the apparatus comprising a processing unit configured to:
generating at least two stacks according to a tree structure of a data set, wherein the intersection of nodes of any two stacks in the at least two stacks is an empty set;
scanning the at least two stacks and processing data in the at least two stacks;
the data set includes a plurality of directories, and the processing unit is specifically configured to:
generating the at least two stacks by using at least two directories in the plurality of directories as at least two stack bottom directories, specifically comprising:
determining a first top-of-stack directory from the plurality of directories, wherein the first top-of-stack directory is a directory which is first not scanned to a subdirectory in the scanning process of the plurality of directories;
generating a first stack according to the first stack top directory and a root directory of the plurality of directories, wherein the root directory is a stack bottom directory of the first stack, the first stack top directory is a stack top directory of the first stack, and the first stack belongs to the at least two stacks;
and generating a second stack according to directories except the directory included by the first stack, wherein the second stack belongs to the at least two stacks, and the stack bottom directory of the second stack is the directory except the directory included by the first stack.
8. The apparatus according to claim 7, wherein the processing unit is specifically configured to:
determining a scanning layer of the first stack according to the first stack top directory, wherein the scanning layer is a layer needing to be scanned preferentially in the first stack, the scanning layer is a layer where a non-stack-bottom directory is located, and the scanning layer and the first stack top directory have a preset corresponding relation;
generating the second stack according to M, N and the directories included in the directories in the scanning layer, where M is the number of directories that can be scanned simultaneously in the scanning layer, N is the number of stacks in which the data set can exist simultaneously during being scanned, the second stack belongs to the at least two stacks, the directories included in the directories in the scanning layer do not belong to the first stack, and M and N are positive integers greater than or equal to 2.
9. The apparatus of claim 8, wherein the directory in the scan layer comprises a first directory, and wherein the processing unit is specifically configured to:
scanning the first directory;
and generating the second stack by taking the subdirectory of the first directory as a stack bottom directory.
10. The apparatus of claim 8, wherein the scan level is a level at which a parent directory of the first top-of-stack directory resides.
11. The apparatus according to any one of claims 7 to 10, wherein the processing unit is specifically configured to:
the at least two stacks are scanned simultaneously by at least two scan threads.
12. The apparatus according to any one of claims 7 to 10, wherein the processing unit is further configured to:
and when any one of the at least two stacks is scanned completely, exiting the scanned stack.
CN201710903238.9A 2017-09-29 2017-09-29 Method and device for processing data set with tree structure Active CN107798073B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710903238.9A CN107798073B (en) 2017-09-29 2017-09-29 Method and device for processing data set with tree structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710903238.9A CN107798073B (en) 2017-09-29 2017-09-29 Method and device for processing data set with tree structure

Publications (2)

Publication Number Publication Date
CN107798073A CN107798073A (en) 2018-03-13
CN107798073B true CN107798073B (en) 2020-07-24

Family

ID=61533871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710903238.9A Active CN107798073B (en) 2017-09-29 2017-09-29 Method and device for processing data set with tree structure

Country Status (1)

Country Link
CN (1) CN107798073B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101208665A (en) * 2005-04-14 2008-06-25 Emc公司 Traversing data in a repeatable manner
CN102156759A (en) * 2011-05-25 2011-08-17 华为技术有限公司 Binary tree parallel inquiry method and device
CN102902734A (en) * 2012-09-12 2013-01-30 北京伸得纬科技有限公司 Method and system for catalogue storage and mapping
CN103532889A (en) * 2013-09-30 2014-01-22 上海交通大学 Soft output parallel stack MIMO (multiple input multiple output) signal detection method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160070644A1 (en) * 2014-09-10 2016-03-10 Netapp, Inc. Offset range operation striping to improve concurrency of execution and reduce contention among resources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101208665A (en) * 2005-04-14 2008-06-25 Emc公司 Traversing data in a repeatable manner
CN102156759A (en) * 2011-05-25 2011-08-17 华为技术有限公司 Binary tree parallel inquiry method and device
CN102902734A (en) * 2012-09-12 2013-01-30 北京伸得纬科技有限公司 Method and system for catalogue storage and mapping
CN103532889A (en) * 2013-09-30 2014-01-22 上海交通大学 Soft output parallel stack MIMO (multiple input multiple output) signal detection method

Also Published As

Publication number Publication date
CN107798073A (en) 2018-03-13

Similar Documents

Publication Publication Date Title
KR102034833B1 (en) Apparatus for Accessing Data Using Internal Parallelism of Flash Storage based on Key-Value and Method thereof
US8868926B2 (en) Cryptographic hash database
Liao et al. Multi-dimensional index on hadoop distributed file system
US10783115B2 (en) Dividing a dataset into sub-datasets having a subset of values of an attribute of the dataset
WO2021068351A1 (en) Cloud-storage-based data transmission method and apparatus, and computer device
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
CN106682215B (en) Data processing method and management node
US11048757B2 (en) Cuckoo tree with duplicate key support
CN105069111A (en) Similarity based data-block-grade data duplication removal method for cloud storage
US20130159347A1 (en) Automatic and dynamic design of cache groups
CN115617762A (en) File storage method and equipment
CN107798073B (en) Method and device for processing data set with tree structure
US8818970B2 (en) Partitioning a directory while accessing the directory
Dillinger et al. Optimal uncoordinated unique ids
CN112711564A (en) Merging processing method and related equipment
Elkin et al. Terminal embeddings
Carter et al. Nanosecond indexing of graph data with hash maps and VLists
US10614055B2 (en) Method and system for tree management of trees under multi-version concurrency control
CN106709045B (en) Node selection method and device in distributed file system
WO2011039841A1 (en) Search device and system
CN105550284B (en) Method and device for mixed use of memory and temporary table space in Presto computing node
US9824105B2 (en) Adaptive probabilistic indexing with skip lists
Pagh Basic external memory data structures
CN111767287A (en) Data import method, device, equipment and computer storage medium
KR20210077975A (en) Spatial indexing method and apparatus for blockchain-based geospatial data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant