WO2022148055A1

WO2022148055A1 - File retrieval method and computing device

Info

Publication number: WO2022148055A1
Application number: PCT/CN2021/118423
Authority: WO
Inventors: 龚恒
Original assignee: 统信软件技术有限公司
Priority date: 2021-01-05
Filing date: 2021-09-15
Publication date: 2022-07-14
Also published as: CN112328548A

Abstract

Disclosed is a file retrieval method. The method comprises: receiving a file retrieval request from a client; determining a retrieval manner, retrieval information and the current catalogue on the basis of the retrieval request, wherein the retrieval manner comprises a file name retrieval manner and/or a file content retrieval manner; determining an index table corresponding to the retrieval manner, wherein the index table comprises a plurality of index items, and each index item comprises an index value and corresponding position information; traversing, in the index table, the index items under the current catalogue, and sequentially comparing the index values in the index items with the retrieval information, to determine one or more index values that match the retrieval information; and generating a retrieval result on the basis of the one or more index values that match the retrieval information and the corresponding position information, and returning the retrieval result to the client. Further disclosed is a corresponding computing device. By means of the file retrieval method of the present invention, a file can be retrieved according to a file name and file content, and the retrieval efficiency is thus high.

Description

A file retrieval method and computing device

technical field

The invention relates to the field of computer technology, and in particular, to a file retrieval method and a computing device.

Background technique

A file manager is one of the essential applications of a computer operating system, and file retrieval is a function that users often use in a file manager. Users can retrieve files in the file manager based on keywords to quickly locate the storage location of specified files and directories, which greatly improves the user's work efficiency. However, the current file manager cannot retrieve the content of the file according to the keyword specified by the user.

In the prior art, the Linux operating system comes with a file retrieval software named Find, which can retrieve the file name, file type, file authority, file modification time and file size of files and directories according to keywords. Find search will traverse each file in the specified directory, read the file list, find the corresponding inode information according to the inode number in the file list, retrieve the inode information and match the user keyword. If the Inode type is a directory, continue to open the directory, read its file list, retrieve information and match keywords, and so on recursively. Because Find retrieval will occur multiple recursive loops, constantly open directories and read file list information, which requires system calls and memory copying, the retrieval process is time-consuming. Moreover, the user generally only retrieves the name of the file or directory, and does not care about the creation time, modification time, etc. in the inode information. However, when retrieving the file name based on Find, the file name and the inode will be folder together, which will reduce the impact on the file. Efficiency of searching by name. Also, Find retrieval cannot retrieve file contents.

Therefore, a file retrieval method is required to solve the problems existing in the above technical solutions.

SUMMARY OF THE INVENTION

To this end, the present invention provides a file retrieval method to try to solve or at least alleviate the above problems.

According to an aspect of the present invention, a file retrieval method is provided, which is executed in a file manager of a computing device, the method comprising: receiving a retrieval request for a file from a client; determining a retrieval method and retrieval information based on the retrieval request and the current directory, the retrieval method includes a file name retrieval method and/or a file content retrieval method; an index table corresponding to the retrieval method is determined, the index table includes a plurality of index items, and each index item includes an index value and the corresponding position information, and obtain the index entries in the index table under the current directory; traverse the index entries in the index table under the current directory, and compare the index values in the index entries with the retrieval information in turn pair, to determine one or more index values that match the retrieval information; and generate retrieval results based on the one or more index values that match the retrieval information and the corresponding location information, and return the retrieval results to the client end.

Optionally, in the file retrieval method according to the present invention, when it is determined that the retrieval method is the file name retrieval method, the method further includes the steps of: monitoring file changes in the computing device in real time; The change event generates the corresponding file change message; the index entry in the index table is updated based on the file change message.

Optionally, in the file retrieval method according to the present invention, the file manager includes a monitoring module and an index processing module; the monitoring module is adapted to monitor file changes in the computing device in real time, and when the file changes are monitored. When an event occurs, a corresponding file change message is generated based on the file change event, and is adapted to send the file change message to the index processing module; the index processing module is adapted to update the index entries in the index table based on the file change message.

Optionally, in the file retrieval method according to the present invention, the file change includes creating a file, deleting a file, and changing a file name; the step of updating the index table based on the file change message includes: determining the change file based on the file change message. Index value and location information; determine the index entry corresponding to the file in the index table based on the location information, and update the index entry.

Optionally, in the file retrieval method according to the present invention, sequentially comparing the index values in the index items with the retrieval information includes: sequentially comparing the index values in the index items with the retrieval information based on the Strstr function. right.

Optionally, in the file retrieval method according to the present invention, the index table includes a file name index table corresponding to the file name retrieval method, and a file content index table corresponding to the file content retrieval method; Before receiving the retrieval request, it also includes the steps of: creating a file name index table, the index value in the file name index table is the file name, and the location information includes path information; and creating a file content index table, in which the file content index table is. The index value is a lemma, and the location information includes a linked list container, and the linked list container includes one or more file names.

Optionally, in the file retrieval method according to the present invention, the step of creating a file content index table includes: acquiring the file content in each file, performing word segmentation processing on the file content to generate a plurality of word elements, and establishing each word element. The association relationship with the file name of the file; based on one or more file names corresponding to each token, a linked list container corresponding to the token is generated; based on multiple tokens and the linked list container corresponding to the token, the index structure is inverted Generate file content index table.

Optionally, in the file retrieval method according to the present invention, if the retrieval mode is the file content retrieval mode, before traversing the index items in the file content index table under the current directory, the step includes: obtaining the index entries in the file content index table. The word element in the current directory, and obtain the linked list container corresponding to the word element; traverse the modification time information corresponding to each file name in the linked list container to determine whether the modification time corresponding to each file name is the same as that in the computing device. The actual modification times of the stored corresponding files are consistent; if not, it is determined that the file is modified, the modified file stored in the computing device is acquired, and the file content index table is updated based on the modified file.

Optionally, in the file retrieval method according to the present invention, if the retrieval method is a file name retrieval method and a file content retrieval method, then: obtain a file name index table and a file content index table respectively; traverse the file name index table. In the index item under the current directory, compare the file names in the index item with the retrieval information in turn, and determine one or more file names that match the retrieval information, so as to generate the first retrieval result; Describe the index entries under the current directory in the content index table of the file, compare the word elements in the index entries with the retrieval information in turn, determine one or more word elements that match the retrieval information, and determine the The file name corresponding to the word element is used to generate the second retrieval result; and the first retrieval result and the second retrieval result are returned to the client.

Optionally, in the file retrieval method according to the present invention, before creating the index table, it includes the step of: converting one or more types of files into plain text format files.

Optionally, in the file retrieval method according to the present invention, the step of converting a file in one or more formats into a plain text format file includes: acquiring the file; performing suffix detection on the file to determine the file type; A parsing method corresponding to the file type described above is used, and the file is parsed based on the parsing method to obtain the plain text content in the file.

Optionally, in the file retrieval method according to the present invention, it further includes the step of: if the file suffix detection fails or the file parsing fails, content detection is performed on the file to obtain the plain text content in the file.

According to one aspect of the present invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the The program instructions include instructions for performing the file retrieval method described above.

According to an aspect of the present invention, there is provided a readable storage medium storing program instructions, which when read and executed by a computing device, cause the computing device to perform the method as described above.

According to the technical solution of the present invention, a file retrieval method is provided, which can retrieve files based on a file name retrieval method and/or a file content retrieval method. Specifically, according to the file retrieval method of the present invention, it is possible to retrieve files based on file names and retrieve files based on file content. In addition, the file can also be retrieved based on the combination of the two retrieval methods of file name and file content. In addition, in the process of matching the retrieval information based on the index table, the present invention only needs to compare the index value (file name or word element) with the retrieval information, but does not need to compare the file attribute information. retrieval efficiency.

Description of drawings

To achieve the above and related objects, certain illustrative aspects are described herein in conjunction with the following description and drawings, which are indicative of the various ways in which the principles disclosed herein may be practiced, and all aspects and their equivalents are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent by reading the following detailed description in conjunction with the accompanying drawings. Throughout this disclosure, the same reference numbers generally refer to the same parts or elements.

FIG. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention;

FIG. 2 shows a flowchart of a file retrieval method 200 according to an embodiment of the present invention;

FIG. 3 shows a schematic diagram of a file name index table according to an embodiment of the present invention; and

FIG. 4 shows a schematic diagram of a file content index table according to an embodiment of the present invention.

Detailed ways

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

FIG. 1 shows a schematic diagram of a computing device 100 according to one embodiment of the present invention.

As shown in FIG. 1 , in a basic configuration 102 , computing device 100 typically includes system memory 106 and one or more processors 104 . The memory bus 108 may be used for communication between the processor 104 and the system memory 106 .

Depending on the desired configuration, the processor 104 may be any type of process including, but not limited to, a microprocessor (UP), a microcontroller (UC), a digital information processor (DSP), or any combination thereof. Processor 104 may include one or more levels of cache, such as L1 cache 110 and L2 cache 112 , processor core 114 , and registers 116 . Exemplary processor cores 114 may include arithmetic logic units (ALUs), floating point units (FPUs), digital signal processing cores (DSP cores), or any combination thereof. The exemplary memory controller 118 may be used with the processor 104 , or in some implementations, the memory controller 118 may be an internal part of the processor 104 .

Depending on the desired configuration, system memory 106 may be any type of memory including, but not limited to, volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include operating system 120 , one or more applications 122 , and program data 124 . In some embodiments, applications 122 may be arranged to execute instructions using program data 124 by one or more processors 104 on an operating system.

Computing device 100 may also include a storage interface bus 134 . Storage interface bus 134 enables communication from storage devices 132 (eg, removable storage 136 and non-removable storage 138 ) to base configuration 102 via bus/interface controller 130 . Operating system 120, applications 122, and at least a portion of data 124 may be stored on removable storage 136 and/or non-removable storage 138, and via the storage interface bus when computing device 100 is powered on or applications 122 are to be executed 134 is loaded into system memory 106 and executed by one or more processors 104 .

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (eg, output device 142 , peripheral interface 144 , and communication device 146 ) to base configuration 102 via bus/interface controller 130 . Exemplary output devices 142 include graphics processing unit 148 and audio processing unit 150. They may be configured to facilitate communication via one or more A/V ports 152 with various external devices such as displays or speakers. Example peripheral interfaces 144 may include serial interface controller 154 and parallel interface controller 156, which may be configured to facilitate communication via one or more I/O ports 158 and input devices such as keyboard, mouse, pen , voice input devices, touch input devices) or other peripherals (eg printers, scanners, etc.) The example communication device 146 may include a network controller 160 that may be arranged to facilitate communication via one or more communication ports 164 with one or more other computing devices 162 over a network communication link.

A network communication link may be one example of a communication medium. Communication media may typically embody computer readable instructions, data structures, program modules in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media. A "modulated data signal" may be a signal in which one or more of its data sets or whose changes may be made in the signal in a manner that encodes information. By way of non-limiting example, communication media may include wired media, such as wired or leased line networks, and various wireless media, such as acoustic, radio frequency (RF), microwave, infrared (IR), or other wireless media. The term computer readable medium as used herein may include both storage media and communication media.

Computing device 100 may be implemented as a personal computer including a desktop computer and a notebook computer configuration. Of course, computing device 100 may also be implemented as part of a small form factor portable (or mobile) electronic device such as a cellular telephone, digital camera, personal digital assistant (PDA), personal media player device, wireless web browsing device , personal headsets, application-specific devices, or hybrid devices that can include any of the above. It can even be implemented as a server, such as a file server, database server, application server, and WEB server. The embodiments of the present invention do not limit this.

In an embodiment according to the present invention, the operating system of the computing device 100 is configured to execute the file retrieval method 200 according to the present invention. The operating system of the computing device 100 includes a plurality of program instructions for executing the file retrieval method 200 according to the present invention.

According to one embodiment, the operating system of the computing device 100 includes a file manager configured to execute the file retrieval method 200 of the present invention.

FIG. 2 shows a flowchart of a file retrieval method 200 according to an embodiment of the present invention.

It should be noted that the computing device 100 includes a data storage device, and various files can be stored in the data storage device. Here, the present invention does not limit the specific types of files.

As shown in FIG. 2, the method 200 starts at step S210.

In step S210, a user's retrieval request for a file at the client is received. Here, the system desktop of the computing device 100 is adapted to present an interface corresponding to the file manager, so that the user can request to retrieve files on the interface corresponding to the file manager. For example, the user inputs corresponding retrieval information through a search box on the interface (eg, keywords), a retrieval request for the file is sent to the file manager of the computing device based on the retrieval information.

In this embodiment of the present invention, when sending a retrieval request, the user may also select a retrieval method for the file. The retrieval methods include file name retrieval methods and file content retrieval methods. Here, the user can select one of two retrieval methods to retrieve files based on one of the file name retrieval method or the file content retrieval method; the user can also choose a retrieval method combining the two retrieval methods, that is, based on the file name retrieval method The retrieval method and the file content retrieval method are combined to retrieve files.

Then, in step S220, the retrieval method, retrieval information and current directory selected by the user are determined based on the retrieval request of the user. It should be understood that the retrieval method determined here may be a file name retrieval method and/or a file content retrieval method. It should also be noted that the retrieval of files in the file manager is usually based on the current file directory, and the file manager can determine the current directory according to the user's retrieval request.

It should be noted that, the two retrieval methods provided by the present invention are respectively based on corresponding index tables to perform file retrieval. Specifically, a file name index table corresponding to the file name retrieval method and a file content index table corresponding to the file content retrieval method can be established in advance, so that in the file manager of the present invention, the file The index table corresponding to the retrieval method is used for file retrieval.

In step S230, an index table corresponding to the retrieval mode is determined, the index table includes a plurality of index items, each index item includes an index value and corresponding file location information, and then the index items in the index table under the current directory are obtained. It should be pointed out that, based on the index item in the current directory, by matching the index value with the retrieval information, the retrieval of the file can be realized.

Here, the index value in each index item is, for example, a keyword or keyword corresponding to a file or file content, and the location information in each index item is the location information stored in the file or file content corresponding to the index value . Specifically, in the file name index table corresponding to the file name retrieval method, the index value may be the file name, and the location information corresponding to the index value includes path information of the file corresponding to the file name. In the file content index table corresponding to the file content retrieval method, the index value may be a word element generated by word segmentation processing according to the file content, the word element may include the phrase and the parent directory path information of the file where the phrase is located, and the location information includes The filename of the file corresponding to the token. And, each index entry in the file content index table may include filenames of a plurality of files corresponding to index values (lemmas). Specifically, each index value (word element) in the file content index table may correspond to one or more file names, and one or more file names constitute a linked list container corresponding to the word element.

Then, in step S240, traverse the index items in the index table under the current directory, and compare the index values in the index items with the retrieval information of the user in turn, so as to determine one or more index values matching the retrieval information . It should be understood that the file corresponding to the index value matching the retrieval information is the retrieved target file.

It should be noted that, according to the file retrieval method of the present invention, only the index value in the index table needs to be compared with the retrieval information, and attribute information such as creation time and modification time of the file need not be compared. Specifically, when retrieving based on the file name retrieval method, only the file name in the file name index table in the current directory needs to be compared with the user's retrieval information; when retrieving based on the file content retrieval method, only the file name in the current directory needs to be compared. The word elements under the current directory in the file content index table are compared with the retrieval information of the user. In this way, the present invention only needs to compare the file name and word element in the index table with the retrieval information, so that the retrieval efficiency of the file is higher.

In one embodiment, the present invention sequentially compares the index values in the index items with the retrieval information based on the Strstr function. This comparison method does not need to call the method in the system, so the memory consumption is low, and the comparison speed is fast, which is beneficial to improve the retrieval efficiency.

In one embodiment, when traversing the index items in the index table, a predetermined number of index items may be loaded into the memory each time and matched with the retrieval information, so that the index items are loaded part by part to compare with the retrieval information, Minimize the memory footprint as much as possible to further improve retrieval efficiency.

In addition, if the file content retrieval method is adopted, when traversing the file content retrieval table, the index items can be read successively based on the byte size. The lemmas are sequentially compared with the retrieval information. In this way, the problem of high memory usage during retrieval based on file content is avoided.

Finally, in step S250, a retrieval result is generated based on one or more index values matching the retrieval information and the position information corresponding to each index value, and the retrieval result is returned to the client. Therefore, the user can view and acquire the corresponding target file based on the position information corresponding to each index value in the retrieval result.

It can be seen that, according to the file retrieval method 200 of the present invention, the file can be retrieved based on the file name and the file can be retrieved based on the file content. In addition, the file can also be retrieved based on the combination of the two retrieval methods of file name and file content.

According to an embodiment of the present invention, if the retrieval method requested by the user is determined to be a file name retrieval method and a file content retrieval method according to a user's retrieval request, that is, a combined retrieval of files is performed based on the file name retrieval method and the file content retrieval method. Then, in the end, the retrieval results obtained respectively from the two retrieval methods are returned to the client user as the final retrieval result.

Specifically, if it is determined that the retrieval methods are the file name retrieval method and the file content retrieval method, the file name index table and the file content index table are obtained respectively. Further, traverse the index items in the file name index table under the current directory, compare the file names in the index items with the retrieval information in turn, and determine one or more file names that match the retrieval information, so as to generate a first retrieval result. Here, the first retrieval result is generated based on one or more file names matching the retrieval information and the location information of the file corresponding to each file name. Similarly, traverse the index entries under the current directory in the file content index table, and compare the word elements in the index entries with the retrieval information in turn to determine one or more word elements that match the retrieval information, and determine the word elements that match the retrieval information. The location information (including one or more file names) corresponding to each word element is then generated based on one or more word elements matching the retrieval information and the location information corresponding to each word element. Finally, the first retrieval result and the second retrieval result are returned to the client together.

According to an embodiment of the present invention, before performing step S210 (receiving a retrieval request from the user at the client), a file name index table is created, where the index value of each index item in the file name index table is the file name, and the location information includes The path information of the corresponding file; and, create a file content index table, the index value of each index item in the file content index table is a word element, and the position information corresponding to the word element includes a linked list container, and the linked list container includes a word element. The corresponding one or more file names.

It should be noted that, before creating the index table, all files of one or more types stored in the computing device are converted into plain text format files. Specifically, by acquiring various types of files locally stored on the computing device, suffix detection is performed on each file to determine the file type. Then, a parsing method corresponding to the detected and determined file type is acquired, and the file is parsed based on the parsing method, so that the plain text content in the file can be acquired. In a specific implementation manner, if the file suffix detection fails or the file parsing fails, content detection may be performed on the file to obtain the plain text content in the file.

It should also be pointed out that the present invention does not limit the specific types of files, and the file types are, for example, MS office series files, Wps office series files, PDF files, e-mail files (EML) or hypertext files (HTML), but are not limited to The file type listed.

In one embodiment, each index entry in the filename index table may further include file type information.

FIG. 3 shows a schematic diagram of a file name index table according to an embodiment of the present invention. As shown in FIG. 3 , the file name index table includes header data and file data, and the header data includes the root directory name. The file data includes a plurality of directory information units, which are directory information unit 1, directory information unit 2, . . . directory information unit n, respectively. Wherein, each directory information unit corresponds to a directory, and the directory information unit includes index entries corresponding to one or more files in the current directory, and each index entry includes a corresponding file name field and a file type field. For example, directory information unit 1 includes index items corresponding to file 1 (file 1 name field and file 1 type field), and index items corresponding to file 2 (file 2 name field and file 2 type field), but not Not limited to this. Here, the file name field and the file type field constitute the directory content information of the directory information unit.

In one embodiment, the file name field can store the name of a common file or directory, and the file type field is divided into two cases. When the file is a common file, this field occupies one When it is a directory, this field occupies four bytes, identifies this as a directory, and records the offset of the first file in this directory so that the directory can be traversed.

In addition, each directory information unit also includes corresponding directory end information, and the directory end information includes the directory end identifier of the current level. The end of the directory at this level is used to identify the end of the content information of the directory at this level, and the directory end information also records the offset of the parent directory, so as to obtain the name of the parent directory, which can be combined with the file name to obtain the full path of the file, based on the file The full path to get the file.

It can be seen that in the file name index table shown in FIG. 3 , the index entries corresponding to files in multiple directories are included, and each index entry includes a file name, a file type and corresponding file location information, and the location information includes a corresponding path information.

FIG. 4 shows a schematic diagram of a file content index table according to an embodiment of the present invention. According to an embodiment, the file content index table can be created according to the following method, and the specific steps include:

First, obtain the file content in each file, perform word segmentation on the file content to generate multiple word elements, and establish an association relationship between each word element and the file name of the file. Here, the present invention does not limit the file type. It should be noted that the word segmentation process is to split the content of the file into multiple word units, and remove punctuation marks and meaningless words. In one embodiment, the file content in the file can be obtained through the Reader tool, multiple phrases can be generated by performing word segmentation on the file content, and the corresponding word can be generated based on the combination of each phrase and the parent directory path of the file where it is located. Yuan. That is, each token includes a phrase, and a parent directory path corresponding to the file where the phrase is located.

For example, a piece of content in the file is "Today's weather is really nice, I'm going to climb a mountain!", after word segmentation of the content, the following phrases will be generated: "today", "weather", "good", "me", " Climbing" etc. Further, the word element is obtained by combining the phrase with the parent directory path of the current file. For example, if a "example.doc" document is included in the "/home/jerry" directory, and the above content is included in the document, the following words will be generated Elements: "/home/jerry/:today", "/home/jerry/:weather", etc.

Further, a linked list container corresponding to the token is generated based on one or more file names corresponding to each token. Here, each token corresponds to a linked list container, and the linked list container includes one or more file names corresponding to the token.

Finally, a file content index table is generated in an inverted index structure based on the multiple word elements and the linked list container corresponding to each word element.

It should be pointed out that the file content index table shown in FIG. 4 is a file content index table with an inverted index structure created according to the above-mentioned method for creating a file content index table. As shown in FIG. 4 , the file content index table includes a plurality of index entries, and each index entry includes a word element and one or more file names corresponding to the word element. For example, in the first index entry, the index value is token 1, and the file names corresponding to token 1 include document 1, document 2, document 4, etc.; in the second index entry, the index value is token 1 2. The file names corresponding to word element 2 include document 1, document 5, etc.; correspondingly, in the nth index entry, the index value is word element n, and the file name corresponding to word element n includes document 3, etc. .

In one embodiment, each index entry in the file content index table further includes file attribute information. The file attribute information includes, for example, information such as creation time, modification time, and file size of the file.

It should be understood that, in the created file content index table, each word element includes a phrase and parent directory path information corresponding to the file where the phrase is located. And the linked list container corresponding to the token includes one or more file names corresponding to the token. In this way, for each index entry in the file content index table, the parent directory path of the corresponding file can be determined based on the word element, and one or more corresponding file names can be determined based on the linked list container, so that the parent directory path and file name can be determined based on the linked list container. name to determine the exact location of the file.

According to one embodiment, when the retrieval mode is the file name retrieval mode, in the process of matching the index value (file name) in the file name index table with the retrieval information, the following method is also performed synchronously to update the file name index table:

Real-time monitoring of file changes in computing devices. When a file change event is monitored, a corresponding file change message is generated based on the file change event, and a corresponding index entry in the file name index table is updated based on the file change message.

Specifically, the file manager includes a monitoring module and an index processing module, and the monitoring module and the index processing module are connected in communication. The monitoring module can monitor file changes in the computing device in real time, including monitoring file creation, deletion, or file name changes. That is to say, the specific types of file changes monitored by the monitoring module include creating files, deleting files, and changing file names. Moreover, when monitoring the above file change event, the monitoring module generates a corresponding file change message based on the file change event. Correspondingly, the file change message includes a file creation message, a file deletion message, and a file name change message. Subsequently, the monitoring module sends the file change message to the index processing module, so that the index processing module updates the corresponding index entry in the file name index table based on the file change message. .

In one embodiment, when the index processing module receives the file change message transmitted by the monitoring module and updates the index table based on the file change message, the index processing module first determines the index value and location information of the changed file based on the file change message, and then, based on the changed file The location information of the file determines the corresponding index entry of the file in the file name index table, and updates the corresponding index entry based on the specific type of file change (create file, delete file, change file name) corresponding to the file change message.

Specifically, when the index processing module receives the file creation message, the location information that can be determined based on the file creation message is the parent directory path information of the created new file. Search the file name index table to determine the position of the index item under the parent directory path in the file name index table, and then insert the file name information corresponding to the file to be created under the parent directory path, so as to realize the file name index A new index entry corresponding to the new file created is inserted into the table.

When the index processing module receives the delete file message, it can determine the full path information of the deleted file based on the delete file message. In this way, it can search the file name index table based on the full path as a key, to determine the file name index table and the file name index table. The index item corresponding to the deleted file is deleted, and then the index item information corresponding to the deleted file is deleted from the file name index table.

When the index processing module receives the file name change message, based on the file name change message, it can determine the full path information of the source file before the file name is changed, and the full path information of the new file after the file name is changed. The file name index table is searched by using the file path as a keyword to determine the index entry corresponding to the source file in the file name index table and delete it; then, the file name index table is searched based on the parent directory path of the new file as a keyword, Determine the position of the index entry under the parent directory path in the file name index table, and then insert the file name (changed file name) corresponding to the new file in the parent directory path, so as to create a changed name. The index entry corresponding to the new file.

According to one embodiment, if the retrieval mode is the file content retrieval mode, before traversing the index items in the file content index table under the current directory, the following steps are performed to update the file content index table:

First, obtain the word element in the current directory requested by the user in the file content index table, and obtain the linked list container corresponding to the word element.

Subsequently, the modification time information of the file corresponding to each file name recorded in the linked list container is traversed, and the modification time corresponding to each file name is determined by obtaining the modification time of the actual file corresponding to each file name from the computing device. Whether the modification time is consistent with the actual modification time of the corresponding file stored in the computing device.

If it is determined to be inconsistent, it is determined that the file is modified, and the file content index table is updated based on the latest modified file by acquiring the modified file stored in the computing device.

Specifically, if it is determined that the file is modified, it is further determined whether the file is newly created or deleted. If it is determined that the file is newly created, the word segmentation process is performed on the content of the newly created file to create the corresponding word element, and the newly created word element and file name are inserted into the file content index table as new index items, thereby updating the file Content index table. If it is determined that the file is deleted, the word element corresponding to the file is deleted from the file content index table. In this way, before retrieving files based on the file content index table, the corresponding index entries in the file content index table are updated for the newly created or deleted files in the current directory, so as to ensure that when retrieving a file, it is based on the latest file The content index table of the file whose status is consistent is matched with the retrieval information, so that the obtained retrieval result conforms to the current file status, so as to ensure that the user obtains an accurate and effective target file.

It can be seen that, according to the file retrieval method of the present invention, a file can be retrieved based on a file name retrieval method and/or a file content retrieval method. Specifically, the present invention can realize the retrieval of files based on the file name and the content of the file, and can also retrieve the file based on the combination of the two retrieval methods of the file name and the file content. In addition, in the process of matching the retrieval information based on the index table, the present invention only needs to compare the index value (file name or word element) with the retrieval information, but does not need to compare the file attribute information. retrieval efficiency.

The various techniques described herein can be implemented in conjunction with hardware or software, or a combination thereof. Thus, the method and apparatus of the present invention, or certain aspects or portions of the method and apparatus of the present invention, may take the form of an embedded tangible medium, such as a removable hard disk, a USB stick, a floppy disk, a CD-ROM, or any other machine-readable storage medium. in the form of program code (ie, instructions) that, when the program is loaded into a machine, such as a computer, and executed by the machine, the machine becomes an apparatus for practicing the invention.

Where the program code is executed on a programmable computer, the computing device typically includes a processor, a storage medium readable by the processor (including volatile and nonvolatile memory and/or storage elements), at least one input device, and at least one output device. Wherein, the memory is configured to store program codes; the processor is configured to execute the multilingual garbage text identification method of the present invention according to the instructions in the program codes stored in the memory.

By way of example and not limitation, readable media include readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.

In the specification provided herein, the algorithms and displays are not inherently related to any particular computer, virtual system, or other device. Various general purpose systems may also be used with examples of the present invention. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not directed to any particular programming language. It is to be understood that various programming languages may be used to implement the inventions described herein, and that the descriptions of specific languages above are intended to disclose the best mode for carrying out the invention.

In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it is to be understood that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together into a single embodiment, figure, or its description. This disclosure, however, should not be interpreted as reflecting an intention that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the apparatus in the examples disclosed herein may be arranged in the apparatus as described in this embodiment, or alternatively may be positioned differently from the apparatus in this example in one or more devices. The modules in the preceding examples may be combined into one module or further divided into sub-modules.

Those skilled in the art will understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that although some of the embodiments described herein include certain features, but not others, included in other embodiments, that combinations of features of different embodiments are intended to be within the scope of the invention within and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as methods or combinations of method elements that can be implemented by a processor of a computer system or by other means for performing the described functions. Thus, a processor having the necessary instructions for implementing the method or method element forms means for implementing the method or method element. Furthermore, an element of an apparatus embodiment described herein is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.

As used herein, unless otherwise specified, the use of the ordinal numbers "first," "second," "third," etc. to describe common objects merely refers to different instances of similar objects, and is not intended to imply such The objects being described must have a given order in time, space, ordinal, or in any other way.

While the invention has been described in terms of a limited number of embodiments, those skilled in the art will appreciate, having the benefit of the above description, that other embodiments are conceivable within the scope of the invention thus described. Furthermore, it should be noted that the language used in this specification has been principally selected for readability and teaching purposes, rather than to explain or define the subject matter of the invention. Accordingly, many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the appended claims. This disclosure is intended to be illustrative, not restrictive, as to the scope of the present invention, which is defined by the appended claims.

Claims

A file retrieval method, executed in a computing device, the method comprising:

Receive the client's retrieval request for the file;

Determine a retrieval method, retrieval information and current directory based on the retrieval request, where the retrieval method includes a file name retrieval method and/or a file content retrieval method;

Determine an index table corresponding to the retrieval mode, the index table includes a plurality of index entries, each index entry includes an index value and corresponding position information, and obtains the index entries in the index table under the current directory;

Traversing the index entries under the current directory in the index table, and sequentially comparing the index values in the index entries with the retrieval information to determine one or more index values matching the retrieval information; and

A retrieval result is generated based on one or more index values matching the retrieval information and corresponding position information, and the retrieval result is returned to the client.
The method according to claim 1, wherein, when the retrieval mode is determined to be a file name retrieval mode, the method further comprises the steps of:

monitor file changes in the computing device in real time;

When a file change event is monitored, a corresponding file change message is generated based on the file change event;

Update index entries in the index table based on file change messages.
The method of claim 2, wherein the computing device includes a file manager, and the file manager includes a monitoring module and an index processing module;

The monitoring module is adapted to monitor file changes in the computing device in real time, and when a file change event is monitored, generates a corresponding file change message based on the file change event, and is adapted to send the file change message to the index processing module;

The index processing module is adapted to update the index entries in the index table based on the file change message.
The method of claim 2, wherein the file modification includes creating a file, deleting a file, and changing a file name; the step of updating the index table based on the file modification message includes:

Determine the index value and location information of the changed file based on the file change message;

The index entry corresponding to the file in the index table is determined based on the location information, and the index entry is updated.
The method according to any one of claims 1-4, wherein sequentially comparing the index values in the index items with the retrieval information comprises:

Based on the Strstr function, the index values in the index items are sequentially compared with the retrieval information.
The method according to claim 1, wherein the index table comprises a file name index table corresponding to the file name retrieval mode and a file content index table corresponding to the file content retrieval mode; Before, also include steps:

Create a file name index table, the index value in the file name index table is the file name, and the location information includes path information; and

A file content index table is created, the index value in the file content index table is a word element, and the location information includes a linked list container, and the linked list container includes one or more file names.
The file retrieval method of claim 6, wherein the step of creating the file content index table comprises:

Obtain the file content in each file, perform word segmentation on the file content to generate multiple word elements, and establish the association relationship between each word element and the file name of the file;

Generate a linked list container corresponding to the token based on one or more file names corresponding to each token;

Based on the multiple word elements and the linked list containers corresponding to the word elements, a file content index table is generated in an inverted index structure.
The method according to claim 6 or 7, wherein, if the retrieval mode is a file content retrieval mode, before traversing the index items in the file content index table under the current directory, the steps include:

Get the word element in the current directory in the file content index table, and obtain the linked list container corresponding to the word element;

Traverse the modification time information corresponding to each file name in the linked list container to determine whether the modification time corresponding to each file name is consistent with the actual modification time of the corresponding file stored in the computing device;

If not, it is determined that the file is modified, the modified file stored in the computing device is acquired, and the file content index table is updated based on the modified file.
The file retrieval method according to claim 6, wherein, if the retrieval mode is a file name retrieval mode and a file content retrieval mode, then:

Obtain the file name index table and file content index table respectively;

Traverse the index entries under the current directory in the file name index table, compare the file names in the index entries with the retrieval information in turn, determine one or more file names that match the retrieval information, and use generating the first search result;

Traverse the index entries under the current directory in the file content index table, compare the word elements in the index entries with the retrieval information in turn, determine one or more word elements that match the retrieval information, and determining a filename corresponding to the lemma to generate a second search result; and

Return the first retrieval result and the second retrieval result to the client.
The file retrieval method as claimed in claim 6, wherein, before creating the index table, it comprises the steps of:

Convert one or more types of files to plain text format files.
The file retrieval method of claim 10, wherein the step of converting one or more formats of files into plain text format files comprises:

get the file;

Perform suffix detection on the file to determine the file type;

A parsing method corresponding to the file type is acquired, and the file is parsed based on the parsing method to obtain plain text content in the file.
The document retrieval method of claim 11, further comprising the steps of:

If the file suffix detection fails or the file parsing fails, content detection is performed on the file to obtain plain text content in the file.
A computing device comprising:

at least one processor; and

a memory storing program instructions, wherein the program instructions are configured to be adapted for execution by the at least one processor, the program instructions comprising means for performing the method of any of claims 1-12 instruction.
A readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the method according to any one of claims 1-12.