WO2022148055A1 - File retrieval method and computing device - Google Patents

File retrieval method and computing device Download PDF

Info

Publication number
WO2022148055A1
WO2022148055A1 PCT/CN2021/118423 CN2021118423W WO2022148055A1 WO 2022148055 A1 WO2022148055 A1 WO 2022148055A1 CN 2021118423 W CN2021118423 W CN 2021118423W WO 2022148055 A1 WO2022148055 A1 WO 2022148055A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
retrieval
index
index table
information
Prior art date
Application number
PCT/CN2021/118423
Other languages
French (fr)
Chinese (zh)
Inventor
龚恒
Original Assignee
统信软件技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 统信软件技术有限公司 filed Critical 统信软件技术有限公司
Publication of WO2022148055A1 publication Critical patent/WO2022148055A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Definitions

  • the invention relates to the field of computer technology, and in particular, to a file retrieval method and a computing device.
  • a file manager is one of the essential applications of a computer operating system, and file retrieval is a function that users often use in a file manager. Users can retrieve files in the file manager based on keywords to quickly locate the storage location of specified files and directories, which greatly improves the user's work efficiency. However, the current file manager cannot retrieve the content of the file according to the keyword specified by the user.
  • the Linux operating system comes with a file retrieval software named Find, which can retrieve the file name, file type, file authority, file modification time and file size of files and directories according to keywords.
  • Find search will traverse each file in the specified directory, read the file list, find the corresponding inode information according to the inode number in the file list, retrieve the inode information and match the user keyword. If the Inode type is a directory, continue to open the directory, read its file list, retrieve information and match keywords, and so on recursively. Because Find retrieval will occur multiple recursive loops, constantly open directories and read file list information, which requires system calls and memory copying, the retrieval process is time-consuming.
  • the user generally only retrieves the name of the file or directory, and does not care about the creation time, modification time, etc. in the inode information.
  • the file name and the inode will be folder together, which will reduce the impact on the file. Efficiency of searching by name. Also, Find retrieval cannot retrieve file contents.
  • the present invention provides a file retrieval method to try to solve or at least alleviate the above problems.
  • a file retrieval method is provided, which is executed in a file manager of a computing device, the method comprising: receiving a retrieval request for a file from a client; determining a retrieval method and retrieval information based on the retrieval request and the current directory, the retrieval method includes a file name retrieval method and/or a file content retrieval method; an index table corresponding to the retrieval method is determined, the index table includes a plurality of index items, and each index item includes an index value and the corresponding position information, and obtain the index entries in the index table under the current directory; traverse the index entries in the index table under the current directory, and compare the index values in the index entries with the retrieval information in turn pair, to determine one or more index values that match the retrieval information; and generate retrieval results based on the one or more index values that match the retrieval information and the corresponding location information, and return the retrieval results to the client end.
  • the method when it is determined that the retrieval method is the file name retrieval method, the method further includes the steps of: monitoring file changes in the computing device in real time; The change event generates the corresponding file change message; the index entry in the index table is updated based on the file change message.
  • the file manager includes a monitoring module and an index processing module; the monitoring module is adapted to monitor file changes in the computing device in real time, and when the file changes are monitored. When an event occurs, a corresponding file change message is generated based on the file change event, and is adapted to send the file change message to the index processing module; the index processing module is adapted to update the index entries in the index table based on the file change message.
  • sequentially comparing the index values in the index items with the retrieval information includes: sequentially comparing the index values in the index items with the retrieval information based on the Strstr function. right.
  • the index table includes a file name index table corresponding to the file name retrieval method, and a file content index table corresponding to the file content retrieval method; Before receiving the retrieval request, it also includes the steps of: creating a file name index table, the index value in the file name index table is the file name, and the location information includes path information; and creating a file content index table, in which the file content index table is.
  • the index value is a lemma
  • the location information includes a linked list container
  • the linked list container includes one or more file names.
  • the step of creating a file content index table includes: acquiring the file content in each file, performing word segmentation processing on the file content to generate a plurality of word elements, and establishing each word element.
  • the association relationship with the file name of the file based on one or more file names corresponding to each token, a linked list container corresponding to the token is generated; based on multiple tokens and the linked list container corresponding to the token, the index structure is inverted Generate file content index table.
  • the step includes: obtaining the index entries in the file content index table.
  • the word element in the current directory and obtain the linked list container corresponding to the word element; traverse the modification time information corresponding to each file name in the linked list container to determine whether the modification time corresponding to each file name is the same as that in the computing device.
  • the actual modification times of the stored corresponding files are consistent; if not, it is determined that the file is modified, the modified file stored in the computing device is acquired, and the file content index table is updated based on the modified file.
  • the retrieval method is a file name retrieval method and a file content retrieval method
  • the retrieval method is a file name retrieval method and a file content retrieval method
  • the index item under the current directory compare the file names in the index item with the retrieval information in turn, and determine one or more file names that match the retrieval information, so as to generate the first retrieval result; Describe the index entries under the current directory in the content index table of the file, compare the word elements in the index entries with the retrieval information in turn, determine one or more word elements that match the retrieval information, and determine the The file name corresponding to the word element is used to generate the second retrieval result; and the first retrieval result and the second retrieval result are returned to the client.
  • the file retrieval method before creating the index table, it includes the step of: converting one or more types of files into plain text format files.
  • the step of converting a file in one or more formats into a plain text format file includes: acquiring the file; performing suffix detection on the file to determine the file type; A parsing method corresponding to the file type described above is used, and the file is parsed based on the parsing method to obtain the plain text content in the file.
  • the file retrieval method further includes the step of: if the file suffix detection fails or the file parsing fails, content detection is performed on the file to obtain the plain text content in the file.
  • a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the The program instructions include instructions for performing the file retrieval method described above.
  • a readable storage medium storing program instructions, which when read and executed by a computing device, cause the computing device to perform the method as described above.
  • a file retrieval method which can retrieve files based on a file name retrieval method and/or a file content retrieval method.
  • the file retrieval method of the present invention it is possible to retrieve files based on file names and retrieve files based on file content.
  • the file can also be retrieved based on the combination of the two retrieval methods of file name and file content.
  • the present invention only needs to compare the index value (file name or word element) with the retrieval information, but does not need to compare the file attribute information. retrieval efficiency.
  • FIG. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention
  • FIG. 2 shows a flowchart of a file retrieval method 200 according to an embodiment of the present invention
  • FIG. 3 shows a schematic diagram of a file name index table according to an embodiment of the present invention.
  • FIG. 4 shows a schematic diagram of a file content index table according to an embodiment of the present invention.
  • FIG. 1 shows a schematic diagram of a computing device 100 according to one embodiment of the present invention.
  • computing device 100 typically includes system memory 106 and one or more processors 104 .
  • the memory bus 108 may be used for communication between the processor 104 and the system memory 106 .
  • the processor 104 may be any type of process including, but not limited to, a microprocessor (UP), a microcontroller (UC), a digital information processor (DSP), or any combination thereof.
  • Processor 104 may include one or more levels of cache, such as L1 cache 110 and L2 cache 112 , processor core 114 , and registers 116 .
  • Exemplary processor cores 114 may include arithmetic logic units (ALUs), floating point units (FPUs), digital signal processing cores (DSP cores), or any combination thereof.
  • the exemplary memory controller 118 may be used with the processor 104 , or in some implementations, the memory controller 118 may be an internal part of the processor 104 .
  • system memory 106 may be any type of memory including, but not limited to, volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof.
  • System memory 106 may include operating system 120 , one or more applications 122 , and program data 124 .
  • applications 122 may be arranged to execute instructions using program data 124 by one or more processors 104 on an operating system.
  • Computing device 100 may also include a storage interface bus 134 .
  • Storage interface bus 134 enables communication from storage devices 132 (eg, removable storage 136 and non-removable storage 138 ) to base configuration 102 via bus/interface controller 130 .
  • Operating system 120, applications 122, and at least a portion of data 124 may be stored on removable storage 136 and/or non-removable storage 138, and via the storage interface bus when computing device 100 is powered on or applications 122 are to be executed 134 is loaded into system memory 106 and executed by one or more processors 104 .
  • Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (eg, output device 142 , peripheral interface 144 , and communication device 146 ) to base configuration 102 via bus/interface controller 130 .
  • interface bus 140 facilitates communication from various interface devices (eg, output device 142 , peripheral interface 144 , and communication device 146 ) to base configuration 102 via bus/interface controller 130 .
  • Exemplary output devices 142 include graphics processing unit 148 and audio processing unit 150. They may be configured to facilitate communication via one or more A/V ports 152 with various external devices such as displays or speakers.
  • Example peripheral interfaces 144 may include serial interface controller 154 and parallel interface controller 156, which may be configured to facilitate communication via one or more I/O ports 158 and input devices such as keyboard, mouse, pen , voice input devices, touch input devices) or other peripherals (eg printers, scanners, etc.)
  • the example communication device 146 may include a network controller 160 that may be arranged to facilitate communication via one or more communication ports 164 with one or more other computing devices 162 over a network communication link.
  • a network communication link may be one example of a communication medium.
  • Communication media may typically embody computer readable instructions, data structures, program modules in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.
  • a "modulated data signal" may be a signal in which one or more of its data sets or whose changes may be made in the signal in a manner that encodes information.
  • communication media may include wired media, such as wired or leased line networks, and various wireless media, such as acoustic, radio frequency (RF), microwave, infrared (IR), or other wireless media.
  • RF radio frequency
  • IR infrared
  • the term computer readable medium as used herein may include both storage media and communication media.
  • Computing device 100 may be implemented as a personal computer including a desktop computer and a notebook computer configuration.
  • computing device 100 may also be implemented as part of a small form factor portable (or mobile) electronic device such as a cellular telephone, digital camera, personal digital assistant (PDA), personal media player device, wireless web browsing device , personal headsets, application-specific devices, or hybrid devices that can include any of the above.
  • PDA personal digital assistant
  • It can even be implemented as a server, such as a file server, database server, application server, and WEB server. The embodiments of the present invention do not limit this.
  • the operating system of the computing device 100 is configured to execute the file retrieval method 200 according to the present invention.
  • the operating system of the computing device 100 includes a plurality of program instructions for executing the file retrieval method 200 according to the present invention.
  • the operating system of the computing device 100 includes a file manager configured to execute the file retrieval method 200 of the present invention.
  • FIG. 2 shows a flowchart of a file retrieval method 200 according to an embodiment of the present invention.
  • the computing device 100 includes a data storage device, and various files can be stored in the data storage device.
  • the present invention does not limit the specific types of files.
  • the method 200 starts at step S210.
  • step S210 a user's retrieval request for a file at the client is received.
  • the system desktop of the computing device 100 is adapted to present an interface corresponding to the file manager, so that the user can request to retrieve files on the interface corresponding to the file manager.
  • the user inputs corresponding retrieval information through a search box on the interface (eg, keywords), a retrieval request for the file is sent to the file manager of the computing device based on the retrieval information.
  • the user when sending a retrieval request, may also select a retrieval method for the file.
  • the retrieval methods include file name retrieval methods and file content retrieval methods.
  • the user can select one of two retrieval methods to retrieve files based on one of the file name retrieval method or the file content retrieval method; the user can also choose a retrieval method combining the two retrieval methods, that is, based on the file name retrieval method
  • the retrieval method and the file content retrieval method are combined to retrieve files.
  • step S220 the retrieval method, retrieval information and current directory selected by the user are determined based on the retrieval request of the user.
  • the retrieval method determined here may be a file name retrieval method and/or a file content retrieval method.
  • the retrieval of files in the file manager is usually based on the current file directory, and the file manager can determine the current directory according to the user's retrieval request.
  • the two retrieval methods provided by the present invention are respectively based on corresponding index tables to perform file retrieval.
  • a file name index table corresponding to the file name retrieval method and a file content index table corresponding to the file content retrieval method can be established in advance, so that in the file manager of the present invention, the file The index table corresponding to the retrieval method is used for file retrieval.
  • step S230 an index table corresponding to the retrieval mode is determined, the index table includes a plurality of index items, each index item includes an index value and corresponding file location information, and then the index items in the index table under the current directory are obtained. It should be pointed out that, based on the index item in the current directory, by matching the index value with the retrieval information, the retrieval of the file can be realized.
  • the index value in each index item is, for example, a keyword or keyword corresponding to a file or file content
  • the location information in each index item is the location information stored in the file or file content corresponding to the index value
  • the index value may be the file name
  • the location information corresponding to the index value includes path information of the file corresponding to the file name
  • the index value may be a word element generated by word segmentation processing according to the file content
  • the word element may include the phrase and the parent directory path information of the file where the phrase is located
  • the location information includes The filename of the file corresponding to the token.
  • each index entry in the file content index table may include filenames of a plurality of files corresponding to index values (lemmas).
  • each index value (word element) in the file content index table may correspond to one or more file names, and one or more file names constitute a linked list container corresponding to the word element.
  • step S240 traverse the index items in the index table under the current directory, and compare the index values in the index items with the retrieval information of the user in turn, so as to determine one or more index values matching the retrieval information .
  • the file corresponding to the index value matching the retrieval information is the retrieved target file.
  • the file retrieval method of the present invention only the index value in the index table needs to be compared with the retrieval information, and attribute information such as creation time and modification time of the file need not be compared.
  • attribute information such as creation time and modification time of the file need not be compared.
  • when retrieving based on the file name retrieval method only the file name in the file name index table in the current directory needs to be compared with the user's retrieval information; when retrieving based on the file content retrieval method, only the file name in the current directory needs to be compared.
  • the word elements under the current directory in the file content index table are compared with the retrieval information of the user. In this way, the present invention only needs to compare the file name and word element in the index table with the retrieval information, so that the retrieval efficiency of the file is higher.
  • the present invention sequentially compares the index values in the index items with the retrieval information based on the Strstr function. This comparison method does not need to call the method in the system, so the memory consumption is low, and the comparison speed is fast, which is beneficial to improve the retrieval efficiency.
  • a predetermined number of index items may be loaded into the memory each time and matched with the retrieval information, so that the index items are loaded part by part to compare with the retrieval information, Minimize the memory footprint as much as possible to further improve retrieval efficiency.
  • the index items can be read successively based on the byte size.
  • the lemmas are sequentially compared with the retrieval information. In this way, the problem of high memory usage during retrieval based on file content is avoided.
  • step S250 a retrieval result is generated based on one or more index values matching the retrieval information and the position information corresponding to each index value, and the retrieval result is returned to the client. Therefore, the user can view and acquire the corresponding target file based on the position information corresponding to each index value in the retrieval result.
  • the file can be retrieved based on the file name and the file can be retrieved based on the file content.
  • the file can also be retrieved based on the combination of the two retrieval methods of file name and file content.
  • the retrieval method requested by the user is determined to be a file name retrieval method and a file content retrieval method according to a user's retrieval request, that is, a combined retrieval of files is performed based on the file name retrieval method and the file content retrieval method. Then, in the end, the retrieval results obtained respectively from the two retrieval methods are returned to the client user as the final retrieval result.
  • the file name index table and the file content index table are obtained respectively. Further, traverse the index items in the file name index table under the current directory, compare the file names in the index items with the retrieval information in turn, and determine one or more file names that match the retrieval information, so as to generate a first retrieval result.
  • the first retrieval result is generated based on one or more file names matching the retrieval information and the location information of the file corresponding to each file name.
  • the index entries under the current directory in the file content index table traverse the index entries under the current directory in the file content index table, and compare the word elements in the index entries with the retrieval information in turn to determine one or more word elements that match the retrieval information, and determine the word elements that match the retrieval information.
  • the location information (including one or more file names) corresponding to each word element is then generated based on one or more word elements matching the retrieval information and the location information corresponding to each word element. Finally, the first retrieval result and the second retrieval result are returned to the client together.
  • a file name index table is created, where the index value of each index item in the file name index table is the file name, and the location information includes The path information of the corresponding file; and, create a file content index table, the index value of each index item in the file content index table is a word element, and the position information corresponding to the word element includes a linked list container, and the linked list container includes a word element.
  • all files of one or more types stored in the computing device are converted into plain text format files. Specifically, by acquiring various types of files locally stored on the computing device, suffix detection is performed on each file to determine the file type. Then, a parsing method corresponding to the detected and determined file type is acquired, and the file is parsed based on the parsing method, so that the plain text content in the file can be acquired. In a specific implementation manner, if the file suffix detection fails or the file parsing fails, content detection may be performed on the file to obtain the plain text content in the file.
  • file types are, for example, MS office series files, Wps office series files, PDF files, e-mail files (EML) or hypertext files (HTML), but are not limited to The file type listed.
  • each index entry in the filename index table may further include file type information.
  • FIG. 3 shows a schematic diagram of a file name index table according to an embodiment of the present invention.
  • the file name index table includes header data and file data, and the header data includes the root directory name.
  • the file data includes a plurality of directory information units, which are directory information unit 1, directory information unit 2, . . . directory information unit n, respectively.
  • each directory information unit corresponds to a directory
  • the directory information unit includes index entries corresponding to one or more files in the current directory
  • each index entry includes a corresponding file name field and a file type field.
  • directory information unit 1 includes index items corresponding to file 1 (file 1 name field and file 1 type field), and index items corresponding to file 2 (file 2 name field and file 2 type field), but not Not limited to this.
  • file name field and the file type field constitute the directory content information of the directory information unit.
  • the file name field can store the name of a common file or directory, and the file type field is divided into two cases.
  • this field occupies one When it is a directory, this field occupies four bytes, identifies this as a directory, and records the offset of the first file in this directory so that the directory can be traversed.
  • each directory information unit also includes corresponding directory end information
  • the directory end information includes the directory end identifier of the current level.
  • the end of the directory at this level is used to identify the end of the content information of the directory at this level, and the directory end information also records the offset of the parent directory, so as to obtain the name of the parent directory, which can be combined with the file name to obtain the full path of the file, based on the file The full path to get the file.
  • each index entry includes a file name, a file type and corresponding file location information, and the location information includes a corresponding path information.
  • FIG. 4 shows a schematic diagram of a file content index table according to an embodiment of the present invention.
  • the file content index table can be created according to the following method, and the specific steps include:
  • the file content in each file can be obtained through the Reader tool, multiple phrases can be generated by performing word segmentation on the file content, and the corresponding word can be generated based on the combination of each phrase and the parent directory path of the file where it is located. Yuan. That is, each token includes a phrase, and a parent directory path corresponding to the file where the phrase is located.
  • a piece of content in the file is "Today's weather is really nice, I'm going to climb a mountain!", after word segmentation of the content, the following phrases will be generated: “today”, “weather”, “good”, “me”, “ Climbing” etc. Further, the word element is obtained by combining the phrase with the parent directory path of the current file. For example, if a "example.doc" document is included in the "/home/jerry” directory, and the above content is included in the document, the following words will be generated Elements: “/home/jerry/:today”, “/home/jerry/:weather”, etc.
  • a linked list container corresponding to the token is generated based on one or more file names corresponding to each token.
  • each token corresponds to a linked list container, and the linked list container includes one or more file names corresponding to the token.
  • a file content index table is generated in an inverted index structure based on the multiple word elements and the linked list container corresponding to each word element.
  • the file content index table shown in FIG. 4 is a file content index table with an inverted index structure created according to the above-mentioned method for creating a file content index table.
  • the file content index table includes a plurality of index entries, and each index entry includes a word element and one or more file names corresponding to the word element.
  • the index value is token 1
  • the file names corresponding to token 1 include document 1, document 2, document 4, etc.
  • the index value is token 1 2.
  • the file names corresponding to word element 2 include document 1, document 5, etc.; correspondingly, in the nth index entry, the index value is word element n, and the file name corresponding to word element n includes document 3, etc. .
  • each index entry in the file content index table further includes file attribute information.
  • the file attribute information includes, for example, information such as creation time, modification time, and file size of the file.
  • each word element includes a phrase and parent directory path information corresponding to the file where the phrase is located.
  • the linked list container corresponding to the token includes one or more file names corresponding to the token.
  • the retrieval mode is the file name retrieval mode
  • the following method is also performed synchronously to update the file name index table:
  • Real-time monitoring of file changes in computing devices When a file change event is monitored, a corresponding file change message is generated based on the file change event, and a corresponding index entry in the file name index table is updated based on the file change message.
  • the file manager includes a monitoring module and an index processing module, and the monitoring module and the index processing module are connected in communication.
  • the monitoring module can monitor file changes in the computing device in real time, including monitoring file creation, deletion, or file name changes. That is to say, the specific types of file changes monitored by the monitoring module include creating files, deleting files, and changing file names.
  • the monitoring module when monitoring the above file change event, the monitoring module generates a corresponding file change message based on the file change event.
  • the file change message includes a file creation message, a file deletion message, and a file name change message. Subsequently, the monitoring module sends the file change message to the index processing module, so that the index processing module updates the corresponding index entry in the file name index table based on the file change message. .
  • the index processing module when the index processing module receives the file change message transmitted by the monitoring module and updates the index table based on the file change message, the index processing module first determines the index value and location information of the changed file based on the file change message, and then, based on the changed file The location information of the file determines the corresponding index entry of the file in the file name index table, and updates the corresponding index entry based on the specific type of file change (create file, delete file, change file name) corresponding to the file change message.
  • the location information that can be determined based on the file creation message is the parent directory path information of the created new file. Search the file name index table to determine the position of the index item under the parent directory path in the file name index table, and then insert the file name information corresponding to the file to be created under the parent directory path, so as to realize the file name index A new index entry corresponding to the new file created is inserted into the table.
  • the index processing module When the index processing module receives the delete file message, it can determine the full path information of the deleted file based on the delete file message. In this way, it can search the file name index table based on the full path as a key, to determine the file name index table and the file name index table. The index item corresponding to the deleted file is deleted, and then the index item information corresponding to the deleted file is deleted from the file name index table.
  • the index processing module When the index processing module receives the file name change message, based on the file name change message, it can determine the full path information of the source file before the file name is changed, and the full path information of the new file after the file name is changed.
  • the file name index table is searched by using the file path as a keyword to determine the index entry corresponding to the source file in the file name index table and delete it; then, the file name index table is searched based on the parent directory path of the new file as a keyword, Determine the position of the index entry under the parent directory path in the file name index table, and then insert the file name (changed file name) corresponding to the new file in the parent directory path, so as to create a changed name.
  • the index entry corresponding to the new file is searched by using the file path as a keyword to determine the index entry corresponding to the source file in the file name index table and delete it; then, the file name index table is searched based on the parent directory path of the new file as a keyword, Determine the position of the index entry under the parent
  • the retrieval mode is the file content retrieval mode
  • the following steps are performed to update the file content index table:
  • the modification time information of the file corresponding to each file name recorded in the linked list container is traversed, and the modification time corresponding to each file name is determined by obtaining the modification time of the actual file corresponding to each file name from the computing device. Whether the modification time is consistent with the actual modification time of the corresponding file stored in the computing device.
  • the file content index table is updated based on the latest modified file by acquiring the modified file stored in the computing device.
  • the file is modified, it is further determined whether the file is newly created or deleted. If it is determined that the file is newly created, the word segmentation process is performed on the content of the newly created file to create the corresponding word element, and the newly created word element and file name are inserted into the file content index table as new index items, thereby updating the file Content index table. If it is determined that the file is deleted, the word element corresponding to the file is deleted from the file content index table.
  • the corresponding index entries in the file content index table are updated for the newly created or deleted files in the current directory, so as to ensure that when retrieving a file, it is based on the latest file
  • the content index table of the file whose status is consistent is matched with the retrieval information, so that the obtained retrieval result conforms to the current file status, so as to ensure that the user obtains an accurate and effective target file.
  • a file can be retrieved based on a file name retrieval method and/or a file content retrieval method.
  • the present invention can realize the retrieval of files based on the file name and the content of the file, and can also retrieve the file based on the combination of the two retrieval methods of the file name and the file content.
  • the present invention only needs to compare the index value (file name or word element) with the retrieval information, but does not need to compare the file attribute information. retrieval efficiency.
  • the various techniques described herein can be implemented in conjunction with hardware or software, or a combination thereof.
  • the method and apparatus of the present invention may take the form of an embedded tangible medium, such as a removable hard disk, a USB stick, a floppy disk, a CD-ROM, or any other machine-readable storage medium.
  • program code ie, instructions
  • the machine becomes an apparatus for practicing the invention.
  • the computing device typically includes a processor, a storage medium readable by the processor (including volatile and nonvolatile memory and/or storage elements), at least one input device, and at least one output device.
  • the memory is configured to store program codes; the processor is configured to execute the multilingual garbage text identification method of the present invention according to the instructions in the program codes stored in the memory.
  • readable media include readable storage media and communication media.
  • Readable storage media store information such as computer readable instructions, data structures, program modules or other data.
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
  • modules or units or components of the apparatus in the examples disclosed herein may be arranged in the apparatus as described in this embodiment, or alternatively may be positioned differently from the apparatus in this example in one or more devices.
  • the modules in the preceding examples may be combined into one module or further divided into sub-modules.
  • modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment.
  • the modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined.
  • Each feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Abstract

Disclosed is a file retrieval method. The method comprises: receiving a file retrieval request from a client; determining a retrieval manner, retrieval information and the current catalogue on the basis of the retrieval request, wherein the retrieval manner comprises a file name retrieval manner and/or a file content retrieval manner; determining an index table corresponding to the retrieval manner, wherein the index table comprises a plurality of index items, and each index item comprises an index value and corresponding position information; traversing, in the index table, the index items under the current catalogue, and sequentially comparing the index values in the index items with the retrieval information, to determine one or more index values that match the retrieval information; and generating a retrieval result on the basis of the one or more index values that match the retrieval information and the corresponding position information, and returning the retrieval result to the client. Further disclosed is a corresponding computing device. By means of the file retrieval method of the present invention, a file can be retrieved according to a file name and file content, and the retrieval efficiency is thus high.

Description

一种文件检索方法及计算设备A file retrieval method and computing device 技术领域technical field
本发明涉及计算机技术领域,特别涉及一种文件检索方法及计算设备。The invention relates to the field of computer technology, and in particular, to a file retrieval method and a computing device.
背景技术Background technique
文件管理器是计算机操作系统的必备应用之一,而文件检索是用户在文件管理器经常会使用的功能。用户可以基于关键字在文件管理器检索文件,以快速定位指定文件及目录的存放位置,极大程度地提高了用户的工作效率。但是,目前的文件管理器无法根据用户指定的关键字检索文件的内容。A file manager is one of the essential applications of a computer operating system, and file retrieval is a function that users often use in a file manager. Users can retrieve files in the file manager based on keywords to quickly locate the storage location of specified files and directories, which greatly improves the user's work efficiency. However, the current file manager cannot retrieve the content of the file according to the keyword specified by the user.
现有技术中,Linux操作系统自带一款名为Find的文件检索软件,可以根据关键字检索文件及目录的文件名、文件类型、文件权限、文件修改时间及文件大小。Find检索会遍历指定目录下的每一个文件,读取其中的文件列表,根据文件列表中的Inode号找到对应的Inode信息,检索Inode信息并匹配用户关键字。如果Inode类型为目录,继续打开该目录,读取其文件列表,检索信息并匹配关键字,如此递归查找。由于Find检索会发生多重递归循环,不断的打开目录并读取文件列表信息,而这需要系统调用和内存复制,因此检索过程比较耗时。而且,用户一般只会检索文件或目录的名称,并不关心其Inode信息中的创建时间、修改时间等,但基于Find检索文件名时,会将文件名和Inode夹在一起,导致会降低对文件名进行检索的效率。此外,Find检索无法检索文件内容。In the prior art, the Linux operating system comes with a file retrieval software named Find, which can retrieve the file name, file type, file authority, file modification time and file size of files and directories according to keywords. Find search will traverse each file in the specified directory, read the file list, find the corresponding inode information according to the inode number in the file list, retrieve the inode information and match the user keyword. If the Inode type is a directory, continue to open the directory, read its file list, retrieve information and match keywords, and so on recursively. Because Find retrieval will occur multiple recursive loops, constantly open directories and read file list information, which requires system calls and memory copying, the retrieval process is time-consuming. Moreover, the user generally only retrieves the name of the file or directory, and does not care about the creation time, modification time, etc. in the inode information. However, when retrieving the file name based on Find, the file name and the inode will be folder together, which will reduce the impact on the file. Efficiency of searching by name. Also, Find retrieval cannot retrieve file contents.
为此,需要一种文件检索方法,来解决上述技术方案中存在的问题。Therefore, a file retrieval method is required to solve the problems existing in the above technical solutions.
发明内容SUMMARY OF THE INVENTION
为此,本发明提供一种文件检索方法,以力图解决或者至少缓解上面存在的问题。To this end, the present invention provides a file retrieval method to try to solve or at least alleviate the above problems.
根据本发明的一个方面,提供了一种文件检索方法,在计算设备的文件 管理器中执行,所述方法包括:接收客户端对文件的检索请求;基于所述检索请求确定检索方式、检索信息以及当前目录,所述检索方式包括文件名检索方式和/或文件内容检索方式;确定与所述检索方式相对应的索引表,所述索引表包括多个索引项,每个索引项包括索引值和相应的位置信息,并获取所述索引表中在当前目录下的索引项;遍历所述索引表中在当前目录下的索引项,将索引项中的索引值依次与所述检索信息进行比对,以确定与所述检索信息相匹配的一个或多个索引值;以及基于与所述检索信息相匹配的一个或多个索引值和相应的位置信息生成检索结果,将检索结果返回至客户端。According to an aspect of the present invention, a file retrieval method is provided, which is executed in a file manager of a computing device, the method comprising: receiving a retrieval request for a file from a client; determining a retrieval method and retrieval information based on the retrieval request and the current directory, the retrieval method includes a file name retrieval method and/or a file content retrieval method; an index table corresponding to the retrieval method is determined, the index table includes a plurality of index items, and each index item includes an index value and the corresponding position information, and obtain the index entries in the index table under the current directory; traverse the index entries in the index table under the current directory, and compare the index values in the index entries with the retrieval information in turn pair, to determine one or more index values that match the retrieval information; and generate retrieval results based on the one or more index values that match the retrieval information and the corresponding location information, and return the retrieval results to the client end.
可选地,在根据本发明的文件检索方法中,当确定检索方式为文件名检索方式时,还包括步骤:实时监控所述计算设备中的文件变更;在监控到文件变更事件时,基于文件变更事件生成相应的文件变更消息;基于文件变更消息更新索引表中的索引项。Optionally, in the file retrieval method according to the present invention, when it is determined that the retrieval method is the file name retrieval method, the method further includes the steps of: monitoring file changes in the computing device in real time; The change event generates the corresponding file change message; the index entry in the index table is updated based on the file change message.
可选地,在根据本发明的文件检索方法中,所述文件管理器包括监控模块和索引处理模块;所述监控模块适于实时监控所述计算设备中的文件变更,并在监控到文件变更事件时,基于文件变更事件生成相应的文件变更消息,并适于将文件变更消息发送至索引处理模块;所述索引处理模块适于基于所述文件变更消息更新索引表中的索引项。Optionally, in the file retrieval method according to the present invention, the file manager includes a monitoring module and an index processing module; the monitoring module is adapted to monitor file changes in the computing device in real time, and when the file changes are monitored. When an event occurs, a corresponding file change message is generated based on the file change event, and is adapted to send the file change message to the index processing module; the index processing module is adapted to update the index entries in the index table based on the file change message.
可选地,在根据本发明的文件检索方法中,所述文件变更包括创建文件、删除文件、更改文件名称;基于文件变更消息更新索引表的步骤包括:基于所述文件变更消息确定变更文件的索引值和位置信息;基于所述位置信息确定该文件在索引表中对应的索引项,并更新所述索引项。Optionally, in the file retrieval method according to the present invention, the file change includes creating a file, deleting a file, and changing a file name; the step of updating the index table based on the file change message includes: determining the change file based on the file change message. Index value and location information; determine the index entry corresponding to the file in the index table based on the location information, and update the index entry.
可选地,在根据本发明的文件检索方法中,将索引项中的索引值依次与所述检索信息进行比对包括:基于Strstr函数将索引项中的索引值依次与所述检索信息进行比对。Optionally, in the file retrieval method according to the present invention, sequentially comparing the index values in the index items with the retrieval information includes: sequentially comparing the index values in the index items with the retrieval information based on the Strstr function. right.
可选地,在根据本发明的文件检索方法中,所述索引表包括与所述文件名检索方式相对应的文件名索引表、与所述文件内容检索方式相对应的文件内容索引表;在接收检索请求之前,还包括步骤:创建文件名索引表,所述文件名索引表中的索引值为文件名,位置信息包括路径信息;以及创建文件内容索引表,所述文件内容索引表中的索引值为词元,位置信息包括链表容 器,所述链表容器包括一个或多个文件名。Optionally, in the file retrieval method according to the present invention, the index table includes a file name index table corresponding to the file name retrieval method, and a file content index table corresponding to the file content retrieval method; Before receiving the retrieval request, it also includes the steps of: creating a file name index table, the index value in the file name index table is the file name, and the location information includes path information; and creating a file content index table, in which the file content index table is. The index value is a lemma, and the location information includes a linked list container, and the linked list container includes one or more file names.
可选地,在根据本发明的文件检索方法中,创建文件内容索引表的步骤包括:获取每个文件中的文件内容,对文件内容进行分词处理生成多个词元,并建立每个词元与文件的文件名的关联关系;基于每个词元对应的一个或多个文件名生成与词元相对应的链表容器;基于多个词元以及词元对应的链表容器,以倒排索引结构生成文件内容索引表。Optionally, in the file retrieval method according to the present invention, the step of creating a file content index table includes: acquiring the file content in each file, performing word segmentation processing on the file content to generate a plurality of word elements, and establishing each word element. The association relationship with the file name of the file; based on one or more file names corresponding to each token, a linked list container corresponding to the token is generated; based on multiple tokens and the linked list container corresponding to the token, the index structure is inverted Generate file content index table.
可选地,在根据本发明的文件检索方法中,如果检索方式为文件内容检索方式,则在遍历文件内容索引表中在当前目录下的索引项之前,包括步骤:获取文件内容索引表中在当前目录下的词元,并获取与词元相对应的链表容器;遍历所述链表容器中的每个文件名对应的修改时间信息,以确定每个文件名对应的修改时间是否与计算设备中存储的相应文件的实际修改时间相一致;如果不一致,则确定该文件被修改,获取计算设备中存储的修改后的文件,并基于修改后的文件对所述文件内容索引表进行更新。Optionally, in the file retrieval method according to the present invention, if the retrieval mode is the file content retrieval mode, before traversing the index items in the file content index table under the current directory, the step includes: obtaining the index entries in the file content index table. The word element in the current directory, and obtain the linked list container corresponding to the word element; traverse the modification time information corresponding to each file name in the linked list container to determine whether the modification time corresponding to each file name is the same as that in the computing device. The actual modification times of the stored corresponding files are consistent; if not, it is determined that the file is modified, the modified file stored in the computing device is acquired, and the file content index table is updated based on the modified file.
可选地,在根据本发明的文件检索方法中,如果检索方式为文件名检索方式和文件内容检索方式,则:分别获取文件名索引表、文件内容索引表;遍历所述文件名索引表中在当前目录下的索引项,将索引项中的文件名依次与所述检索信息进行比对,确定与所述检索信息相匹配的一个或多个文件名,以生成第一检索结果;遍历所述文件内容索引表中在当前目录下的索引项,将索引项中的词元依次与所述检索信息进行比对,确定与所述检索信息相匹配的一个或多个词元,并确定与词元相对应的文件名,以生成第二检索结果;以及将第一检索结果和第二检索结果返回至客户端。Optionally, in the file retrieval method according to the present invention, if the retrieval method is a file name retrieval method and a file content retrieval method, then: obtain a file name index table and a file content index table respectively; traverse the file name index table. In the index item under the current directory, compare the file names in the index item with the retrieval information in turn, and determine one or more file names that match the retrieval information, so as to generate the first retrieval result; Describe the index entries under the current directory in the content index table of the file, compare the word elements in the index entries with the retrieval information in turn, determine one or more word elements that match the retrieval information, and determine the The file name corresponding to the word element is used to generate the second retrieval result; and the first retrieval result and the second retrieval result are returned to the client.
可选地,在根据本发明的文件检索方法中,在创建索引表之前,包括步骤:将一种多种类型的文件转换为纯文本格式文件。Optionally, in the file retrieval method according to the present invention, before creating the index table, it includes the step of: converting one or more types of files into plain text format files.
可选地,在根据本发明的文件检索方法中,将一种或多种格式的文件转换为纯文本格式文件的步骤包括:获取文件;对文件进行后缀检测,以确定文件类型;获取与所述文件类型相对应的解析方法,基于所述解析方法对文件进行解析,以获取所述文件中的纯文本内容。Optionally, in the file retrieval method according to the present invention, the step of converting a file in one or more formats into a plain text format file includes: acquiring the file; performing suffix detection on the file to determine the file type; A parsing method corresponding to the file type described above is used, and the file is parsed based on the parsing method to obtain the plain text content in the file.
可选地,在根据本发明的文件检索方法中,还包括步骤:如果对文件后缀检测失败或者对文件解析失败,则对文件进行内容检测,以获取所述文件 中的纯文本内容。Optionally, in the file retrieval method according to the present invention, it further includes the step of: if the file suffix detection fails or the file parsing fails, content detection is performed on the file to obtain the plain text content in the file.
根据本发明的一个方面,提供了一种计算设备,包括:至少一个处理器;以及存储器,存储有程序指令,其中,所述程序指令被配置为适于由所述至少一个处理器执行,所述程序指令包括用于执行如上所述的文件检索方法的指令。According to one aspect of the present invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the The program instructions include instructions for performing the file retrieval method described above.
根据本发明的一个方面,提供了一种存储有程序指令的可读存储介质,当所述程序指令被计算设备读取并执行时,使得所述计算设备执行如上所述方法。According to an aspect of the present invention, there is provided a readable storage medium storing program instructions, which when read and executed by a computing device, cause the computing device to perform the method as described above.
根据本发明的技术方案,提供了一种文件检索方法,可以基于文件名检索方式和/或文件内容检索方式来检索文件。具体而言,根据本发明的文件检索方法,既可以实现基于文件名来检索文件,又可以基于文件内容来检索文件。并且,还可以基于文件名和文件内容两种检索方式进行组合来检索文件。另外,本发明在基于索引表与检索信息进行匹配过程中,仅需要将索引值(文件名或词元)与检索信息进行比对,而无需比较文件属性信息,这样,有利于提高对文件的检索效率。According to the technical solution of the present invention, a file retrieval method is provided, which can retrieve files based on a file name retrieval method and/or a file content retrieval method. Specifically, according to the file retrieval method of the present invention, it is possible to retrieve files based on file names and retrieve files based on file content. In addition, the file can also be retrieved based on the combination of the two retrieval methods of file name and file content. In addition, in the process of matching the retrieval information based on the index table, the present invention only needs to compare the index value (file name or word element) with the retrieval information, but does not need to compare the file attribute information. retrieval efficiency.
附图说明Description of drawings
为了实现上述以及相关目的,本文结合下面的描述和附图来描述某些说明性方面,这些方面指示了可以实践本文所公开的原理的各种方式,并且所有方面及其等效方面旨在落入所要求保护的主题的范围内。通过结合附图阅读下面的详细描述,本公开的上述以及其它目的、特征和优势将变得更加明显。遍及本公开,相同的附图标记通常指代相同的部件或元素。To achieve the above and related objects, certain illustrative aspects are described herein in conjunction with the following description and drawings, which are indicative of the various ways in which the principles disclosed herein may be practiced, and all aspects and their equivalents are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent by reading the following detailed description in conjunction with the accompanying drawings. Throughout this disclosure, the same reference numbers generally refer to the same parts or elements.
图1示出了根据本发明一个实施例的计算设备100的示意图;FIG. 1 shows a schematic diagram of a computing device 100 according to an embodiment of the present invention;
图2示出了根据本发明一个实施例的文件检索方法200的流程图;FIG. 2 shows a flowchart of a file retrieval method 200 according to an embodiment of the present invention;
图3示出了根据本发明一个实施例的文件名索引表的示意图;以及FIG. 3 shows a schematic diagram of a file name index table according to an embodiment of the present invention; and
图4示出了根据本发明一个实施例的文件内容索引表的示意图。FIG. 4 shows a schematic diagram of a file content index table according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示 了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.
图1示出了根据本发明一个实施例的计算设备100的示意图。FIG. 1 shows a schematic diagram of a computing device 100 according to one embodiment of the present invention.
如图1所示,在基本的配置102中,计算设备100典型地包括系统存储器106和一个或者多个处理器104。存储器总线108可以用于在处理器104和系统存储器106之间的通信。As shown in FIG. 1 , in a basic configuration 102 , computing device 100 typically includes system memory 106 and one or more processors 104 . The memory bus 108 may be used for communication between the processor 104 and the system memory 106 .
取决于期望的配置,处理器104可以是任何类型的处理,包括但不限于:微处理器(UP)、微控制器(UC)、数字信息处理器(DSP)或者它们的任何组合。处理器104可以包括诸如一级高速缓存110和二级高速缓存112之类的一个或者多个级别的高速缓存、处理器核心114和寄存器116。示例的处理器核心114可以包括运算逻辑单元(ALU)、浮点数单元(FPU)、数字信号处理核心(DSP核心)或者它们的任何组合。示例的存储器控制器118可以与处理器104一起使用,或者在一些实现中,存储器控制器118可以是处理器104的一个内部部分。Depending on the desired configuration, the processor 104 may be any type of process including, but not limited to, a microprocessor (UP), a microcontroller (UC), a digital information processor (DSP), or any combination thereof. Processor 104 may include one or more levels of cache, such as L1 cache 110 and L2 cache 112 , processor core 114 , and registers 116 . Exemplary processor cores 114 may include arithmetic logic units (ALUs), floating point units (FPUs), digital signal processing cores (DSP cores), or any combination thereof. The exemplary memory controller 118 may be used with the processor 104 , or in some implementations, the memory controller 118 may be an internal part of the processor 104 .
取决于期望的配置,系统存储器106可以是任意类型的存储器,包括但不限于:易失性存储器(诸如RAM)、非易失性存储器(诸如ROM、闪存等)或者它们的任何组合。系统存储器106可以包括操作系统120、一个或者多个应用122以及程序数据124。在一些实施方式中,应用122可以布置为在操作系统上由一个或多个处理器104利用程序数据124执行指令。Depending on the desired configuration, system memory 106 may be any type of memory including, but not limited to, volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include operating system 120 , one or more applications 122 , and program data 124 . In some embodiments, applications 122 may be arranged to execute instructions using program data 124 by one or more processors 104 on an operating system.
计算设备100还可以包括储存接口总线134。储存接口总线134实现了从储存设备132(例如,可移除储存器136和不可移除储存器138)经由总线/接口控制器130到基本配置102的通信。操作系统120、应用122以及数据124的至少一部分可以存储在可移除储存器136和/或不可移除储存器138上,并且在计算设备100上电或者要执行应用122时,经由储存接口总线134而加载到系统存储器106中,并由一个或者多个处理器104来执行。Computing device 100 may also include a storage interface bus 134 . Storage interface bus 134 enables communication from storage devices 132 (eg, removable storage 136 and non-removable storage 138 ) to base configuration 102 via bus/interface controller 130 . Operating system 120, applications 122, and at least a portion of data 124 may be stored on removable storage 136 and/or non-removable storage 138, and via the storage interface bus when computing device 100 is powered on or applications 122 are to be executed 134 is loaded into system memory 106 and executed by one or more processors 104 .
计算设备100还可以包括有助于从各种接口设备(例如,输出设备142、外设接口144和通信设备146)到基本配置102经由总线/接口控制器130的通信的接口总线140。示例的输出设备142包括图形处理单元148和音频处理 单元150。它们可以被配置为有助于经由一个或者多个A/V端口152与诸如显示器或者扬声器之类的各种外部设备进行通信。示例外设接口144可以包括串行接口控制器154和并行接口控制器156,它们可以被配置为有助于经由一个或者多个I/O端口158和诸如输入设备(例如,键盘、鼠标、笔、语音输入设备、触摸输入设备)或者其他外设(例如打印机、扫描仪等)之类的外部设备进行通信。示例的通信设备146可以包括网络控制器160,其可以被布置为便于经由一个或者多个通信端口164与一个或者多个其他计算设备162通过网络通信链路的通信。Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (eg, output device 142 , peripheral interface 144 , and communication device 146 ) to base configuration 102 via bus/interface controller 130 . Exemplary output devices 142 include graphics processing unit 148 and audio processing unit 150. They may be configured to facilitate communication via one or more A/V ports 152 with various external devices such as displays or speakers. Example peripheral interfaces 144 may include serial interface controller 154 and parallel interface controller 156, which may be configured to facilitate communication via one or more I/O ports 158 and input devices such as keyboard, mouse, pen , voice input devices, touch input devices) or other peripherals (eg printers, scanners, etc.) The example communication device 146 may include a network controller 160 that may be arranged to facilitate communication via one or more communication ports 164 with one or more other computing devices 162 over a network communication link.
网络通信链路可以是通信介质的一个示例。通信介质通常可以体现为在诸如载波或者其他传输机制之类的调制数据信号中的计算机可读指令、数据结构、程序模块,并且可以包括任何信息递送介质。“调制数据信号”可以是这样的信号,它的数据集中的一个或者多个或者它的改变可以在信号中以编码信息的方式进行。作为非限制性的示例,通信介质可以包括诸如有线网络或者专线网络之类的有线介质,以及诸如声音、射频(RF)、微波、红外(IR)或者其它无线介质在内的各种无线介质。这里使用的术语计算机可读介质可以包括存储介质和通信介质二者。A network communication link may be one example of a communication medium. Communication media may typically embody computer readable instructions, data structures, program modules in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media. A "modulated data signal" may be a signal in which one or more of its data sets or whose changes may be made in the signal in a manner that encodes information. By way of non-limiting example, communication media may include wired media, such as wired or leased line networks, and various wireless media, such as acoustic, radio frequency (RF), microwave, infrared (IR), or other wireless media. The term computer readable medium as used herein may include both storage media and communication media.
计算设备100可以实现为包括桌面计算机和笔记本计算机配置的个人计算机。当然,计算设备100也可以实现为小尺寸便携(或者移动)电子设备的一部分,这些电子设备可以是诸如蜂窝电话、数码照相机、个人数字助理(PDA)、个人媒体播放器设备、无线网络浏览设备、个人头戴设备、应用专用设备、或者可以包括上面任何功能的混合设备。甚至可以被实现为服务器,如文件服务器、数据库服务器、应用程序服务器和WEB服务器等。本发明的实施例对此均不做限制。Computing device 100 may be implemented as a personal computer including a desktop computer and a notebook computer configuration. Of course, computing device 100 may also be implemented as part of a small form factor portable (or mobile) electronic device such as a cellular telephone, digital camera, personal digital assistant (PDA), personal media player device, wireless web browsing device , personal headsets, application-specific devices, or hybrid devices that can include any of the above. It can even be implemented as a server, such as a file server, database server, application server, and WEB server. The embodiments of the present invention do not limit this.
在根据本发明的实施例中,计算设备100的操作系统被配置为执行根据本发明的文件检索方法200。其中,计算设备100的操作系统中包含执行根据本发明的文件检索方法200的多条程序指令。In an embodiment according to the present invention, the operating system of the computing device 100 is configured to execute the file retrieval method 200 according to the present invention. The operating system of the computing device 100 includes a plurality of program instructions for executing the file retrieval method 200 according to the present invention.
根据一个实施例,计算设备100的操作系统中包括文件管理器,文件管理器被配置为执行本发明的文件检索方法200。According to one embodiment, the operating system of the computing device 100 includes a file manager configured to execute the file retrieval method 200 of the present invention.
图2示出了根据本发明一个实施例的文件检索方法200的流程图。FIG. 2 shows a flowchart of a file retrieval method 200 according to an embodiment of the present invention.
应当指出,计算设备100中包括数据存储装置,数据存储装置中可以存储各种各样的文件,这里,本发明对文件的具体类型不做限制。It should be noted that the computing device 100 includes a data storage device, and various files can be stored in the data storage device. Here, the present invention does not limit the specific types of files.
如图2所示,方法200始于步骤S210。As shown in FIG. 2, the method 200 starts at step S210.
在步骤S210中,接收用户在客户端对文件的检索请求。这里,计算设备100的系统桌面上适于呈现与文件管理器相对应的界面,从而用户可以在文件管理器对应的界面请求检索文件,例如,用户通过在界面上的搜索框输入相应的检索信息(例如关键字),基于检索信息向计算设备的文件管理器发送对文件的检索请求。In step S210, a user's retrieval request for a file at the client is received. Here, the system desktop of the computing device 100 is adapted to present an interface corresponding to the file manager, so that the user can request to retrieve files on the interface corresponding to the file manager. For example, the user inputs corresponding retrieval information through a search box on the interface (eg, keywords), a retrieval request for the file is sent to the file manager of the computing device based on the retrieval information.
在本发明的实施例中,用户在发送检索请求时,还可以选择对文件的检索方式。检索方式包括文件名检索方式、文件内容检索方式。这里,用户可以选择两种检索方式中的一种,基于文件名检索方式或文件内容检索方式中的一种来检索文件;用户也可以选择两种检索方式组合的检索方式,即,基于文件名检索方式和文件内容检索方式来组合检索文件。In this embodiment of the present invention, when sending a retrieval request, the user may also select a retrieval method for the file. The retrieval methods include file name retrieval methods and file content retrieval methods. Here, the user can select one of two retrieval methods to retrieve files based on one of the file name retrieval method or the file content retrieval method; the user can also choose a retrieval method combining the two retrieval methods, that is, based on the file name retrieval method The retrieval method and the file content retrieval method are combined to retrieve files.
随后,在步骤S220中,基于用户的检索请求确定用户选择的检索方式、检索信息以及当前目录。应当理解,这里所确定的检索方式可以为文件名检索方式和/或文件内容检索方式。还应当指出,在文件管理器检索文件通常是基于当前文件目录来检索文件,文件管理器可以根据用户的检索请求确定当前目录。Then, in step S220, the retrieval method, retrieval information and current directory selected by the user are determined based on the retrieval request of the user. It should be understood that the retrieval method determined here may be a file name retrieval method and/or a file content retrieval method. It should also be noted that the retrieval of files in the file manager is usually based on the current file directory, and the file manager can determine the current directory according to the user's retrieval request.
需要说明的是,本发明提供的两种检索方式分别是基于相应的索引表来进行文件检索。具体而言,可以预先建立与文件名检索方式相对应的文件名索引表、与文件内容检索方式相对应的文件内容索引表,从而,在本发明的文件管理器中,可以基于与用户选择的检索方式相对应的索引表来进行文件检索。It should be noted that, the two retrieval methods provided by the present invention are respectively based on corresponding index tables to perform file retrieval. Specifically, a file name index table corresponding to the file name retrieval method and a file content index table corresponding to the file content retrieval method can be established in advance, so that in the file manager of the present invention, the file The index table corresponding to the retrieval method is used for file retrieval.
在步骤S230中,确定与检索方式相对应的索引表,索引表包括多个索引项,每个索引项包括索引值和相应的文件位置信息,进而获取索引表中在当前目录下的索引项。应当指出,基于当前目录下的索引项,通过将索引值与检索信息进行匹配,可以实现对文件的检索。In step S230, an index table corresponding to the retrieval mode is determined, the index table includes a plurality of index items, each index item includes an index value and corresponding file location information, and then the index items in the index table under the current directory are obtained. It should be pointed out that, based on the index item in the current directory, by matching the index value with the retrieval information, the retrieval of the file can be realized.
这里,每个索引项中的索引值例如是与文件或文件内容相对应的关键字或关键词,每个索引项中的位置信息是与索引值相对应的文件或文件内容所 存放的位置信息。具体而言,在与文件名检索方式相对应的文件名索引表中,索引值可以为文件名,与索引值相对应的位置信息包括文件名对应的文件的路径信息。在与文件内容检索方式相对应的文件内容索引表中,索引值可以是根据文件内容进行分词处理生成的词元,词元可以包括词组以及词组所在的文件的父级目录路径信息,位置信息包括词元对应的文件的文件名。并且,在文件内容索引表中的每个索引项可以包括与索引值(词元)相对应的多个文件的文件名。具体地,文件内容索引表中的每个索引值(词元)可以对应一个或者多个文件名,一个或者多个文件名构成与词元相对应的链表容器。Here, the index value in each index item is, for example, a keyword or keyword corresponding to a file or file content, and the location information in each index item is the location information stored in the file or file content corresponding to the index value . Specifically, in the file name index table corresponding to the file name retrieval method, the index value may be the file name, and the location information corresponding to the index value includes path information of the file corresponding to the file name. In the file content index table corresponding to the file content retrieval method, the index value may be a word element generated by word segmentation processing according to the file content, the word element may include the phrase and the parent directory path information of the file where the phrase is located, and the location information includes The filename of the file corresponding to the token. And, each index entry in the file content index table may include filenames of a plurality of files corresponding to index values (lemmas). Specifically, each index value (word element) in the file content index table may correspond to one or more file names, and one or more file names constitute a linked list container corresponding to the word element.
随后,在步骤S240中,遍历索引表中在当前目录下的索引项,将索引项中的索引值依次与用户的检索信息进行比对,从而确定与检索信息相匹配的一个或多个索引值。应当理解,与检索信息相匹配的索引值对应的文件便是检索到的目标文件。Then, in step S240, traverse the index items in the index table under the current directory, and compare the index values in the index items with the retrieval information of the user in turn, so as to determine one or more index values matching the retrieval information . It should be understood that the file corresponding to the index value matching the retrieval information is the retrieved target file.
需要说的是,根据本发明的文件检索方法,仅需要将索引表中的索引值与检索信息进行比对,并不需要比对文件的创建时间、修改时间等属性信息。具体地,当基于文件名检索方式进行检索时,仅需要将文件名索引表中在当前目录下的文件名与用户的检索信息进行比对;当基于文件内容检索方式进行检索时,仅需要将文件内容索引表中在当前目录下的词元与用户的检索信息进行比对。这样,本发明仅需要将索引表中的文件名、词元与检索信息进行比对,使得对文件的检索效率更高。It should be noted that, according to the file retrieval method of the present invention, only the index value in the index table needs to be compared with the retrieval information, and attribute information such as creation time and modification time of the file need not be compared. Specifically, when retrieving based on the file name retrieval method, only the file name in the file name index table in the current directory needs to be compared with the user's retrieval information; when retrieving based on the file content retrieval method, only the file name in the current directory needs to be compared. The word elements under the current directory in the file content index table are compared with the retrieval information of the user. In this way, the present invention only needs to compare the file name and word element in the index table with the retrieval information, so that the retrieval efficiency of the file is higher.
在一种实施方式中,本发明基于Strstr函数将索引项中的索引值依次与检索信息进行比对。这种比对方式,不需要调用系统中的方法,从而对内存的消耗低,并且比对速度快,有利于提高检索效率。In one embodiment, the present invention sequentially compares the index values in the index items with the retrieval information based on the Strstr function. This comparison method does not need to call the method in the system, so the memory consumption is low, and the comparison speed is fast, which is beneficial to improve the retrieval efficiency.
在一个实施例中,在遍历索引表中的索引项时,可以每次将预定数量的索引项加载至内存并与检索信息进行匹配,这样一部分一部分地加载索引项来与检索信息进行比对,尽可能降低对内存的占用空间,从而进一步提高检索效率。In one embodiment, when traversing the index items in the index table, a predetermined number of index items may be loaded into the memory each time and matched with the retrieval information, so that the index items are loaded part by part to compare with the retrieval information, Minimize the memory footprint as much as possible to further improve retrieval efficiency.
另外,如果采用文件内容检索方式,则在遍历文件内容检索表时,可以基于字节大小来逐次读取索引项,例如,每次将预定字节的若干个词元加载至内存,并在内存将词元依次与检索信息进行比对。这样,避免了在基于文 件内容进行检索时造成内存占用率较高的问题。In addition, if the file content retrieval method is adopted, when traversing the file content retrieval table, the index items can be read successively based on the byte size. The lemmas are sequentially compared with the retrieval information. In this way, the problem of high memory usage during retrieval based on file content is avoided.
最后,在步骤S250中,基于与检索信息相匹配的一个或多个索引值和每个索引值对应的位置信息生成检索结果,将检索结果返回至客户端。从而,用户可以基于检索结果中与每个索引值相对应的位置信息查看、获取相应的目标文件。Finally, in step S250, a retrieval result is generated based on one or more index values matching the retrieval information and the position information corresponding to each index value, and the retrieval result is returned to the client. Therefore, the user can view and acquire the corresponding target file based on the position information corresponding to each index value in the retrieval result.
可见,根据本发明的文件检索方法200,既可以基于文件名来检索文件,又可以基于文件内容来检索文件。并且,还可以基于文件名和文件内容两种检索方式进行组合来检索文件。It can be seen that, according to the file retrieval method 200 of the present invention, the file can be retrieved based on the file name and the file can be retrieved based on the file content. In addition, the file can also be retrieved based on the combination of the two retrieval methods of file name and file content.
根据本发明的实施例,如果根据用户的检索请求确定用户请求的检索方式为文件名检索方式和文件内容检索方式,即,基于文件名检索方式和文件内容检索方式进行组合检索文件。那么,最终是根据两种检索方式分别得出的检索结果来作为最终检索结果返回至客户端用户。According to an embodiment of the present invention, if the retrieval method requested by the user is determined to be a file name retrieval method and a file content retrieval method according to a user's retrieval request, that is, a combined retrieval of files is performed based on the file name retrieval method and the file content retrieval method. Then, in the end, the retrieval results obtained respectively from the two retrieval methods are returned to the client user as the final retrieval result.
具体而言,如果确定检索方式为文件名检索方式和文件内容检索方式,则分别获取文件名索引表、文件内容索引表。进而,遍历文件名索引表中在当前目录下的索引项,将索引项中的文件名依次与检索信息进行比对,确定与检索信息相匹配的一个或多个文件名,以生成第一检索结果。这里,即是基于与检索信息相匹配的一个或多个文件名和每个文件名对应的文件的位置信息来生成第一检索结果。相似地,遍历文件内容索引表中在当前目录下的索引项,将索引项中的词元依次与检索信息进行比对,以确定与检索信息相匹配的一个或多个词元,并确定与每个词元相对应的位置信息(包括一个或多个文件名),随后,基于与检索信息相匹配的一个或多个词元以及每个词元对应的位置信息来生成第二检索结果。最后,将第一检索结果和第二检索结果一起返回至客户端。Specifically, if it is determined that the retrieval methods are the file name retrieval method and the file content retrieval method, the file name index table and the file content index table are obtained respectively. Further, traverse the index items in the file name index table under the current directory, compare the file names in the index items with the retrieval information in turn, and determine one or more file names that match the retrieval information, so as to generate a first retrieval result. Here, the first retrieval result is generated based on one or more file names matching the retrieval information and the location information of the file corresponding to each file name. Similarly, traverse the index entries under the current directory in the file content index table, and compare the word elements in the index entries with the retrieval information in turn to determine one or more word elements that match the retrieval information, and determine the word elements that match the retrieval information. The location information (including one or more file names) corresponding to each word element is then generated based on one or more word elements matching the retrieval information and the location information corresponding to each word element. Finally, the first retrieval result and the second retrieval result are returned to the client together.
根据本发明的实施例,在执行步骤S210(接收用户在客户端的检索请求)之前,创建文件名索引表,这里,文件名索引表中的每个索引项的索引值为文件名,位置信息包括相应文件的路径信息;并且,创建文件内容索引表,文件内容索引表中的每个索引项的索引值为词元,与词元相对应的位置信息包括链表容器,链表容器包括与词元相对应的一个或多个文件名。According to an embodiment of the present invention, before performing step S210 (receiving a retrieval request from the user at the client), a file name index table is created, where the index value of each index item in the file name index table is the file name, and the location information includes The path information of the corresponding file; and, create a file content index table, the index value of each index item in the file content index table is a word element, and the position information corresponding to the word element includes a linked list container, and the linked list container includes a word element. The corresponding one or more file names.
需要说明的是,在创建索引表之前,将计算设备中所存储的一种或多种 类型的所有文件均转换为纯文本格式文件。具体地,通过获取计算设备本地存储的各种类型的文件,对每个文件进行后缀检测,以确定文件类型。随后,获取与所检测确定的文件类型相对应的解析方法,基于该解析方法对文件进行解析,这样便可以获取文件中的纯文本内容。在具体实施方式中,如果对文件后缀检测失败或者对文件解析失败,则可以采取对文件进行内容检测,以获取文件中的纯文本内容。It should be noted that, before creating the index table, all files of one or more types stored in the computing device are converted into plain text format files. Specifically, by acquiring various types of files locally stored on the computing device, suffix detection is performed on each file to determine the file type. Then, a parsing method corresponding to the detected and determined file type is acquired, and the file is parsed based on the parsing method, so that the plain text content in the file can be acquired. In a specific implementation manner, if the file suffix detection fails or the file parsing fails, content detection may be performed on the file to obtain the plain text content in the file.
还应当指出,本发明对文件的具体类型不做限制,文件类型例如为MS office系列文件、Wps office系列文件、PDF文件、电子邮件文件(EML)或超文本文件(HTML),但并不限于所列举的文件类型。It should also be pointed out that the present invention does not limit the specific types of files, and the file types are, for example, MS office series files, Wps office series files, PDF files, e-mail files (EML) or hypertext files (HTML), but are not limited to The file type listed.
在一个实施例中,文件名索引表中的每个索引项还可以包括文件类型信息。In one embodiment, each index entry in the filename index table may further include file type information.
图3示出了根据本发明一个实施例的文件名索引表的示意图。如图3所示,文件名索引表包括头部数据和文件数据,头部数据包括根目录名称。文件数据包括多个目录信息单元,分别为目录信息单元1、目录信息单元2……目录信息单元n。其中,每个目录信息单元对应一个目录,且目录信息单元包括在当前目录下的一个或多个文件对应的索引项,每个索引项分别包括相应的文件名字段,还包括文件类型字段。例如,目录信息单元1中包括与文件1相对应的索引项(文件1名字段和文件1类型字段)、与文件2相对应的索引项(文件2名字段和文件2类型字段),但并不限于此。这里,文件名字段和文件类型字段组成目录信息单元的目录内容信息。FIG. 3 shows a schematic diagram of a file name index table according to an embodiment of the present invention. As shown in FIG. 3 , the file name index table includes header data and file data, and the header data includes the root directory name. The file data includes a plurality of directory information units, which are directory information unit 1, directory information unit 2, . . . directory information unit n, respectively. Wherein, each directory information unit corresponds to a directory, and the directory information unit includes index entries corresponding to one or more files in the current directory, and each index entry includes a corresponding file name field and a file type field. For example, directory information unit 1 includes index items corresponding to file 1 (file 1 name field and file 1 type field), and index items corresponding to file 2 (file 2 name field and file 2 type field), but not Not limited to this. Here, the file name field and the file type field constitute the directory content information of the directory information unit.
在一种实施方式中,文件名字段可以存放普通文件或目录的名称,文件类型字段分为两种情况,当文件为普通文件时,该字段占据一个字节,标识这是一个文件,当文件为目录时,该字段占据四个字节,标识这是一个目录,并记录这个目录中第一个文件的偏移量,以便对该目录进行遍历。In one embodiment, the file name field can store the name of a common file or directory, and the file type field is divided into two cases. When the file is a common file, this field occupies one When it is a directory, this field occupies four bytes, identifies this as a directory, and records the offset of the first file in this directory so that the directory can be traversed.
另外,每个目录信息单元还包括相应的目录结尾信息,目录结尾信息包括本级目录结尾标识。通过本级目录结尾标识来标识本级目录内容信息结束,目录结尾信息还记录父级目录的偏移量,以便获取父级目录的名称,从而与文件名进行组合得到文件的全路径,基于文件的全路径能够获取文件。In addition, each directory information unit also includes corresponding directory end information, and the directory end information includes the directory end identifier of the current level. The end of the directory at this level is used to identify the end of the content information of the directory at this level, and the directory end information also records the offset of the parent directory, so as to obtain the name of the parent directory, which can be combined with the file name to obtain the full path of the file, based on the file The full path to get the file.
可见,在图3示出的文件名索引表中,包括在多个目录下的文件对应的 索引项,每个索引项包括文件名、文件类型以及相应的文件位置信息,位置信息包括相应的路径信息。It can be seen that in the file name index table shown in FIG. 3 , the index entries corresponding to files in multiple directories are included, and each index entry includes a file name, a file type and corresponding file location information, and the location information includes a corresponding path information.
图4示出了根据本发明一个实施例的文件内容索引表的示意图。根据一个实施例,可以按照以下方法来创建文件内容索引表,具体步骤包括:FIG. 4 shows a schematic diagram of a file content index table according to an embodiment of the present invention. According to an embodiment, the file content index table can be created according to the following method, and the specific steps include:
首先,获取每个文件中的文件内容,对文件内容进行分词处理生成多个词元,并建立每个词元与文件的文件名的关联关系。这里,本发明对文件类型不做限制。需要说明的是,分词处理即是将文件内容拆分成多个词元,并去除标点符号和无意义词语。在一种实施方式中,可以通过Reader工具获取文件中的文件内容,通过对文件内容进行分词处理可以生成多个词组,基于每个词组以及所在文件的父级目录路径进行组合来生成相应的词元。也就是说,每个词元包括词组、以及与词组所在文件对应的父级目录路径。First, obtain the file content in each file, perform word segmentation on the file content to generate multiple word elements, and establish an association relationship between each word element and the file name of the file. Here, the present invention does not limit the file type. It should be noted that the word segmentation process is to split the content of the file into multiple word units, and remove punctuation marks and meaningless words. In one embodiment, the file content in the file can be obtained through the Reader tool, multiple phrases can be generated by performing word segmentation on the file content, and the corresponding word can be generated based on the combination of each phrase and the parent directory path of the file where it is located. Yuan. That is, each token includes a phrase, and a parent directory path corresponding to the file where the phrase is located.
例如,文件中的一段内容为“今天天气真不错,我要去爬山!”,对该内容进行分词处理后会生成如下词组:“今天”、“天气”、“不错”、“我”、“爬山”等。进而,将词组与当前文件的父级目录路径进行组合得到词元,例如,在“/home/jerry”目录下包括一个“example.doc”文档,该文档中包括上述内容,则会生成如下词元:“/home/jerry/:今天”、“/home/jerry/:天气”等。For example, a piece of content in the file is "Today's weather is really nice, I'm going to climb a mountain!", after word segmentation of the content, the following phrases will be generated: "today", "weather", "good", "me", " Climbing" etc. Further, the word element is obtained by combining the phrase with the parent directory path of the current file. For example, if a "example.doc" document is included in the "/home/jerry" directory, and the above content is included in the document, the following words will be generated Elements: "/home/jerry/:today", "/home/jerry/:weather", etc.
进而,基于每个词元对应的一个或多个文件名生成与词元相对应的链表容器。这里,每个词元对应一个链表容器,链表容器中包括与词元相对应的一个或多个文件名。Further, a linked list container corresponding to the token is generated based on one or more file names corresponding to each token. Here, each token corresponds to a linked list container, and the linked list container includes one or more file names corresponding to the token.
最后,基于多个词元以及每个词元对应的链表容器,以倒排索引结构生成文件内容索引表。Finally, a file content index table is generated in an inverted index structure based on the multiple word elements and the linked list container corresponding to each word element.
应当指出,图4中所示的文件内容索引表即是根据上述创建文件内容索引表的方法所创建的倒排索引结构的文件内容索引表。如图4所示,文件内容索引表中包括多个索引项,每个索引项包括一个词元以及与词元相对应的一个或者多个文件名。例如,在第一个索引项中,索引值为词元1,与词元1相对应的文件名包括文档1、文档2、文档4等;在第二个索引项中,索引值为词元2,与词元2相对应的文件名包括文档1、文档5等;相应地,在第n个索引项中,索引值为词元n,与词元n相对应的文件名包括文档3等。It should be pointed out that the file content index table shown in FIG. 4 is a file content index table with an inverted index structure created according to the above-mentioned method for creating a file content index table. As shown in FIG. 4 , the file content index table includes a plurality of index entries, and each index entry includes a word element and one or more file names corresponding to the word element. For example, in the first index entry, the index value is token 1, and the file names corresponding to token 1 include document 1, document 2, document 4, etc.; in the second index entry, the index value is token 1 2. The file names corresponding to word element 2 include document 1, document 5, etc.; correspondingly, in the nth index entry, the index value is word element n, and the file name corresponding to word element n includes document 3, etc. .
在一种实施方式中,文件内容索引表中的每个索引项还包括文件属性信息。文件属性信息例如包括文件的创建时间、修改时间、文件大小等信息。In one embodiment, each index entry in the file content index table further includes file attribute information. The file attribute information includes, for example, information such as creation time, modification time, and file size of the file.
应当理解,在所创建的文件内容索引表中,每个词元包括词组、以及与词组所在文件对应的父级目录路径信息。而与词元相对应的链表容器包括与词元相对应的一个或多个文件名。这样,对于文件内容索引表中的每个索引项,基于词元可以确定相应文件的父级目录路径,基于链表容器可以确定相应的一个或多个文件名,从而可以基于父级目录路径和文件名来确定文件具体位置。It should be understood that, in the created file content index table, each word element includes a phrase and parent directory path information corresponding to the file where the phrase is located. And the linked list container corresponding to the token includes one or more file names corresponding to the token. In this way, for each index entry in the file content index table, the parent directory path of the corresponding file can be determined based on the word element, and one or more corresponding file names can be determined based on the linked list container, so that the parent directory path and file name can be determined based on the linked list container. name to determine the exact location of the file.
根据一个实施例,当检索方式为文件名检索方式时,在将文件名索引表中的索引值(文件名)与检索信息进行匹配的过程中,还同步执行以下方法来更新文件名索引表:According to one embodiment, when the retrieval mode is the file name retrieval mode, in the process of matching the index value (file name) in the file name index table with the retrieval information, the following method is also performed synchronously to update the file name index table:
实时监控计算设备中的文件变更。在监控到文件变更事件时,基于文件变更事件生成相应的文件变更消息,并基于文件变更消息更新文件名索引表中的相应索引项。Real-time monitoring of file changes in computing devices. When a file change event is monitored, a corresponding file change message is generated based on the file change event, and a corresponding index entry in the file name index table is updated based on the file change message.
具体地,文件管理器包括监控模块和索引处理模块,监控模块和索引处理模块通信连接。通过监控模块可以实时监控计算设备中的文件变更,具体包括监控文件的创建、删除或者更改文件名称。也就是说,监控模块所监控的文件变更的具体类型包括创建文件、删除文件、更改文件名称。并且,监控模块在监控到以上文件变更事件时,基于文件变更事件生成相应的文件变更消息,相应地,文件变更消息包括创建文件消息、删除文件消息、更改文件名称消息。随后,监控模块将文件变更消息发送至索引处理模块,以便由索引处理模块基于文件变更消息来更新文件名索引表中的相应索引项。。Specifically, the file manager includes a monitoring module and an index processing module, and the monitoring module and the index processing module are connected in communication. The monitoring module can monitor file changes in the computing device in real time, including monitoring file creation, deletion, or file name changes. That is to say, the specific types of file changes monitored by the monitoring module include creating files, deleting files, and changing file names. Moreover, when monitoring the above file change event, the monitoring module generates a corresponding file change message based on the file change event. Correspondingly, the file change message includes a file creation message, a file deletion message, and a file name change message. Subsequently, the monitoring module sends the file change message to the index processing module, so that the index processing module updates the corresponding index entry in the file name index table based on the file change message. .
在一个实施例中,索引处理模块在接收到监控模块传送的文件变更消息、在基于文件变更消息更新索引表时,首先基于文件变更消息确定变更文件的索引值和位置信息,进而,基于变更文件的位置信息确定该文件在文件名索引表中对应的索引项,并基于文件变更消息对应的文件变更的具体类型(创建文件、删除文件、更改文件名称)来更新相应的索引项。In one embodiment, when the index processing module receives the file change message transmitted by the monitoring module and updates the index table based on the file change message, the index processing module first determines the index value and location information of the changed file based on the file change message, and then, based on the changed file The location information of the file determines the corresponding index entry of the file in the file name index table, and updates the corresponding index entry based on the specific type of file change (create file, delete file, change file name) corresponding to the file change message.
具体而言,索引处理模块在接收到创建文件消息时,基于创建文件消息可以确定的位置信息为所创建的新文件的父级目录路径信息,这样,便可以 基于父级目录路径为关键字来搜索文件名索引表,以确定文件名索引表中在该父级目录路径下的索引项的位置,进而在该父级目录路径下插入待创建文件对应的文件名信息,从而实现在文件名索引表中插入与创建的新文件相对应的新的索引项。Specifically, when the index processing module receives the file creation message, the location information that can be determined based on the file creation message is the parent directory path information of the created new file. Search the file name index table to determine the position of the index item under the parent directory path in the file name index table, and then insert the file name information corresponding to the file to be created under the parent directory path, so as to realize the file name index A new index entry corresponding to the new file created is inserted into the table.
索引处理模块在接收到删除文件消息时,基于删除文件消息可以确定被删除文件的全路径信息,这样,便可以基于全路径为关键字来搜索文件名索引表,以确定文件名索引表中与被删除文件相对应的索引项,进而将被删除文件对应的索引项信息从文件名索引表中删除。When the index processing module receives the delete file message, it can determine the full path information of the deleted file based on the delete file message. In this way, it can search the file name index table based on the full path as a key, to determine the file name index table and the file name index table. The index item corresponding to the deleted file is deleted, and then the index item information corresponding to the deleted file is deleted from the file name index table.
索引处理模块在接收到更改文件名称消息时,基于更改文件名称消息可以确定更改文件名称前的源文件的全路径信息、以及更改文件名称后的新文件的全路径信息,这样,便可以基于源文件路径为关键字来搜索文件名索引表,以确定文件名索引表中与源文件相对应的索引项并删除;随后,基于新文件的父级目录路径为关键字来搜索文件名索引表,以确定文件名索引表中在该父级目录路径下的索引项的位置,进而在该父级目录路径下插入新文件对应的文件名(更改后的文件名),从而实现创建更改名称后的新文件对应的索引项。When the index processing module receives the file name change message, based on the file name change message, it can determine the full path information of the source file before the file name is changed, and the full path information of the new file after the file name is changed. The file name index table is searched by using the file path as a keyword to determine the index entry corresponding to the source file in the file name index table and delete it; then, the file name index table is searched based on the parent directory path of the new file as a keyword, Determine the position of the index entry under the parent directory path in the file name index table, and then insert the file name (changed file name) corresponding to the new file in the parent directory path, so as to create a changed name. The index entry corresponding to the new file.
根据一个实施例,如果检索方式为文件内容检索方式,则在遍历文件内容索引表中在当前目录下的索引项之前,执行以下步骤来对文件内容索引表进行更新:According to one embodiment, if the retrieval mode is the file content retrieval mode, before traversing the index items in the file content index table under the current directory, the following steps are performed to update the file content index table:
首先,获取文件内容索引表中在用户请求的当前目录下的词元,并获取与词元相对应的链表容器。First, obtain the word element in the current directory requested by the user in the file content index table, and obtain the linked list container corresponding to the word element.
随后,遍历链表容器中记录的与每个文件名相对应的文件的修改时间信息,并通过从计算设备获取与每个文件名相对应的实际文件的修改时间,来确定每个文件名对应的修改时间是否与计算设备中存储的相应文件的实际修改时间相一致。Subsequently, the modification time information of the file corresponding to each file name recorded in the linked list container is traversed, and the modification time corresponding to each file name is determined by obtaining the modification time of the actual file corresponding to each file name from the computing device. Whether the modification time is consistent with the actual modification time of the corresponding file stored in the computing device.
如果确定不一致,则确定该文件被修改,通过获取计算设备中存储的修改后的文件,并基于修改后的最新文件对文件内容索引表进行更新。If it is determined to be inconsistent, it is determined that the file is modified, and the file content index table is updated based on the latest modified file by acquiring the modified file stored in the computing device.
具体而言,如果确定文件被修改,则进一步判断文件是被新建还是被删除。如果确定文件被新建,则针对被新建的文件内容进行分词处理,以创建 相应的词元,并将新创建的词元和文件名作为新的索引项插入到文件内容索引表中,从而更新文件内容索引表。如果确定文件被删除,则将该文件对应的词元从文件内容索引表中删除。这样,在基于文件内容索引表来检索文件之前,针对当前目录下被新建或删除后的文件来更新文件内容索引表中的相应索引项,从而确保在检索文件时,是基于与当前最新的文件状态相符的文件内容索引表来与检索信息进行匹配,这样所得到的检索结果符合当前文件状态,确保用户获取到准确、有效的目标文件。Specifically, if it is determined that the file is modified, it is further determined whether the file is newly created or deleted. If it is determined that the file is newly created, the word segmentation process is performed on the content of the newly created file to create the corresponding word element, and the newly created word element and file name are inserted into the file content index table as new index items, thereby updating the file Content index table. If it is determined that the file is deleted, the word element corresponding to the file is deleted from the file content index table. In this way, before retrieving files based on the file content index table, the corresponding index entries in the file content index table are updated for the newly created or deleted files in the current directory, so as to ensure that when retrieving a file, it is based on the latest file The content index table of the file whose status is consistent is matched with the retrieval information, so that the obtained retrieval result conforms to the current file status, so as to ensure that the user obtains an accurate and effective target file.
可见,根据本发明的文件检索方法,可以基于文件名检索方式和/或文件内容检索方式来检索文件。具体而言,本发明既可以实现基于文件名来检索文件,又可以基于文件内容来检索文件,并且,还可以基于文件名和文件内容两种检索方式进行组合来检索文件。另外,本发明在基于索引表与检索信息进行匹配过程中,仅需要将索引值(文件名或词元)与检索信息进行比对,而无需比较文件属性信息,这样,有利于提高对文件的检索效率。It can be seen that, according to the file retrieval method of the present invention, a file can be retrieved based on a file name retrieval method and/or a file content retrieval method. Specifically, the present invention can realize the retrieval of files based on the file name and the content of the file, and can also retrieve the file based on the combination of the two retrieval methods of the file name and the file content. In addition, in the process of matching the retrieval information based on the index table, the present invention only needs to compare the index value (file name or word element) with the retrieval information, but does not need to compare the file attribute information. retrieval efficiency.
这里描述的各种技术可结合硬件或软件,或者它们的组合一起实现。从而,本发明的方法和设备,或者本发明的方法和设备的某些方面或部分可采取嵌入有形媒介,例如可移动硬盘、U盘、软盘、CD-ROM或者其它任意机器可读的存储介质中的程序代码(即指令)的形式,其中当程序被载入诸如计算机之类的机器,并被所述机器执行时,所述机器变成实践本发明的设备。The various techniques described herein can be implemented in conjunction with hardware or software, or a combination thereof. Thus, the method and apparatus of the present invention, or certain aspects or portions of the method and apparatus of the present invention, may take the form of an embedded tangible medium, such as a removable hard disk, a USB stick, a floppy disk, a CD-ROM, or any other machine-readable storage medium. in the form of program code (ie, instructions) that, when the program is loaded into a machine, such as a computer, and executed by the machine, the machine becomes an apparatus for practicing the invention.
在程序代码在可编程计算机上执行的情况下,计算设备一般包括处理器、处理器可读的存储介质(包括易失性和非易失性存储器和/或存储元件),至少一个输入装置,和至少一个输出装置。其中,存储器被配置用于存储程序代码;处理器被配置用于根据该存储器中存储的所述程序代码中的指令,执行本发明的多语言垃圾文本的识别方法。Where the program code is executed on a programmable computer, the computing device typically includes a processor, a storage medium readable by the processor (including volatile and nonvolatile memory and/or storage elements), at least one input device, and at least one output device. Wherein, the memory is configured to store program codes; the processor is configured to execute the multilingual garbage text identification method of the present invention according to the instructions in the program codes stored in the memory.
以示例而非限制的方式,可读介质包括可读存储介质和通信介质。可读存储介质存储诸如计算机可读指令、数据结构、程序模块或其它数据等信息。通信介质一般以诸如载波或其它传输机制等已调制数据信号来体现计算机可读指令、数据结构、程序模块或其它数据,并且包括任何信息传递介质。以上的任一种的组合也包括在可读介质的范围之内。By way of example and not limitation, readable media include readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
在此处所提供的说明书中,算法和显示不与任何特定计算机、虚拟系统 或者其它设备固有相关。各种通用系统也可以与本发明的示例一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。In the specification provided herein, the algorithms and displays are not inherently related to any particular computer, virtual system, or other device. Various general purpose systems may also be used with examples of the present invention. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not directed to any particular programming language. It is to be understood that various programming languages may be used to implement the inventions described herein, and that the descriptions of specific languages above are intended to disclose the best mode for carrying out the invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it is to be understood that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together into a single embodiment, figure, or its description. This disclosure, however, should not be interpreted as reflecting an intention that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
本领域那些技术人员应当理解在本文所公开的示例中的设备的模块或单元或组件可以布置在如该实施例中所描述的设备中,或者可替换地可以定位在与该示例中的设备不同的一个或多个设备中。前述示例中的模块可以组合为一个模块或者此外可以分成多个子模块。Those skilled in the art will appreciate that the modules or units or components of the apparatus in the examples disclosed herein may be arranged in the apparatus as described in this embodiment, or alternatively may be positioned differently from the apparatus in this example in one or more devices. The modules in the preceding examples may be combined into one module or further divided into sub-modules.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will appreciate that although some of the embodiments described herein include certain features, but not others, included in other embodiments, that combinations of features of different embodiments are intended to be within the scope of the invention within and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
此外,所述实施例中的一些在此被描述成可以由计算机系统的处理器或者由执行所述功能的其它装置实施的方法或方法元素的组合。因此,具有用于实施所述方法或方法元素的必要指令的处理器形成用于实施该方法或方法元素的装置。此外,装置实施例的在此所述的元素是如下装置的例子:该装置用于实施由为了实施该发明的目的的元素所执行的功能。Furthermore, some of the described embodiments are described herein as methods or combinations of method elements that can be implemented by a processor of a computer system or by other means for performing the described functions. Thus, a processor having the necessary instructions for implementing the method or method element forms means for implementing the method or method element. Furthermore, an element of an apparatus embodiment described herein is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.
如在此所使用的那样,除非另行规定,使用序数词“第一”、“第二”、“第三”等等来描述普通对象仅仅表示涉及类似对象的不同实例,并且并不意图暗示这样被描述的对象必须具有时间上、空间上、排序方面或者以任意其它方式的给定顺序。As used herein, unless otherwise specified, the use of the ordinal numbers "first," "second," "third," etc. to describe common objects merely refers to different instances of similar objects, and is not intended to imply such The objects being described must have a given order in time, space, ordinal, or in any other way.
尽管根据有限数量的实施例描述了本发明,但是受益于上面的描述,本技术领域内的技术人员明白,在由此描述的本发明的范围内,可以设想其它实施例。此外,应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。While the invention has been described in terms of a limited number of embodiments, those skilled in the art will appreciate, having the benefit of the above description, that other embodiments are conceivable within the scope of the invention thus described. Furthermore, it should be noted that the language used in this specification has been principally selected for readability and teaching purposes, rather than to explain or define the subject matter of the invention. Accordingly, many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the appended claims. This disclosure is intended to be illustrative, not restrictive, as to the scope of the present invention, which is defined by the appended claims.

Claims (14)

  1. 一种文件检索方法,在计算设备中执行,所述方法包括:A file retrieval method, executed in a computing device, the method comprising:
    接收客户端对文件的检索请求;Receive the client's retrieval request for the file;
    基于所述检索请求确定检索方式、检索信息以及当前目录,所述检索方式包括文件名检索方式和/或文件内容检索方式;Determine a retrieval method, retrieval information and current directory based on the retrieval request, where the retrieval method includes a file name retrieval method and/or a file content retrieval method;
    确定与所述检索方式相对应的索引表,所述索引表包括多个索引项,每个索引项包括索引值和相应的位置信息,并获取所述索引表中在当前目录下的索引项;Determine an index table corresponding to the retrieval mode, the index table includes a plurality of index entries, each index entry includes an index value and corresponding position information, and obtains the index entries in the index table under the current directory;
    遍历所述索引表中在当前目录下的索引项,将索引项中的索引值依次与所述检索信息进行比对,以确定与所述检索信息相匹配的一个或多个索引值;以及Traversing the index entries under the current directory in the index table, and sequentially comparing the index values in the index entries with the retrieval information to determine one or more index values matching the retrieval information; and
    基于与所述检索信息相匹配的一个或多个索引值和相应的位置信息生成检索结果,将检索结果返回至客户端。A retrieval result is generated based on one or more index values matching the retrieval information and corresponding position information, and the retrieval result is returned to the client.
  2. 如权利要求1所述的方法,其中,当确定检索方式为文件名检索方式时,还包括步骤:The method according to claim 1, wherein, when the retrieval mode is determined to be a file name retrieval mode, the method further comprises the steps of:
    实时监控所述计算设备中的文件变更;monitor file changes in the computing device in real time;
    在监控到文件变更事件时,基于文件变更事件生成相应的文件变更消息;When a file change event is monitored, a corresponding file change message is generated based on the file change event;
    基于文件变更消息更新索引表中的索引项。Update index entries in the index table based on file change messages.
  3. 如权利要求2所述的方法,其中,所述计算设备中包括文件管理器,所述文件管理器包括监控模块和索引处理模块;The method of claim 2, wherein the computing device includes a file manager, and the file manager includes a monitoring module and an index processing module;
    所述监控模块适于实时监控所述计算设备中的文件变更,并在监控到文件变更事件时,基于文件变更事件生成相应的文件变更消息,并适于将文件变更消息发送至索引处理模块;The monitoring module is adapted to monitor file changes in the computing device in real time, and when a file change event is monitored, generates a corresponding file change message based on the file change event, and is adapted to send the file change message to the index processing module;
    所述索引处理模块适于基于所述文件变更消息更新索引表中的索引项。The index processing module is adapted to update the index entries in the index table based on the file change message.
  4. 如权利要求2所述的方法,其中,所述文件变更包括创建文件、删除文件、更改文件名称;基于文件变更消息更新索引表的步骤包括:The method of claim 2, wherein the file modification includes creating a file, deleting a file, and changing a file name; the step of updating the index table based on the file modification message includes:
    基于所述文件变更消息确定变更文件的索引值和位置信息;Determine the index value and location information of the changed file based on the file change message;
    基于所述位置信息确定该文件在索引表中对应的索引项,并更新所述索引项。The index entry corresponding to the file in the index table is determined based on the location information, and the index entry is updated.
  5. 如权利要求1-4任一项所述的方法,其中,将索引项中的索引值依次与所述检索信息进行比对包括:The method according to any one of claims 1-4, wherein sequentially comparing the index values in the index items with the retrieval information comprises:
    基于Strstr函数将索引项中的索引值依次与所述检索信息进行比对。Based on the Strstr function, the index values in the index items are sequentially compared with the retrieval information.
  6. 如权利要求1所述的方法,其中,所述索引表包括与所述文件名检索方式相对应的文件名索引表、与所述文件内容检索方式相对应的文件内容索引表;在接收检索请求之前,还包括步骤:The method according to claim 1, wherein the index table comprises a file name index table corresponding to the file name retrieval mode and a file content index table corresponding to the file content retrieval mode; Before, also include steps:
    创建文件名索引表,所述文件名索引表中的索引值为文件名,位置信息包括路径信息;以及Create a file name index table, the index value in the file name index table is the file name, and the location information includes path information; and
    创建文件内容索引表,所述文件内容索引表中的索引值为词元,位置信息包括链表容器,所述链表容器包括一个或多个文件名。A file content index table is created, the index value in the file content index table is a word element, and the location information includes a linked list container, and the linked list container includes one or more file names.
  7. 如权利要求6所述的文件检索方法,其中,创建文件内容索引表的步骤包括:The file retrieval method of claim 6, wherein the step of creating the file content index table comprises:
    获取每个文件中的文件内容,对文件内容进行分词处理生成多个词元,并建立每个词元与文件的文件名的关联关系;Obtain the file content in each file, perform word segmentation on the file content to generate multiple word elements, and establish the association relationship between each word element and the file name of the file;
    基于每个词元对应的一个或多个文件名生成与词元相对应的链表容器;Generate a linked list container corresponding to the token based on one or more file names corresponding to each token;
    基于多个词元以及词元对应的链表容器,以倒排索引结构生成文件内容索引表。Based on the multiple word elements and the linked list containers corresponding to the word elements, a file content index table is generated in an inverted index structure.
  8. 如权利要求6或7所述的方法,其中,如果检索方式为文件内容检索方式,则在遍历文件内容索引表中在当前目录下的索引项之前,包括步骤:The method according to claim 6 or 7, wherein, if the retrieval mode is a file content retrieval mode, before traversing the index items in the file content index table under the current directory, the steps include:
    获取文件内容索引表中在当前目录下的词元,并获取与词元相对应的链表容器;Get the word element in the current directory in the file content index table, and obtain the linked list container corresponding to the word element;
    遍历所述链表容器中的每个文件名对应的修改时间信息,以确定每个文件名对应的修改时间是否与计算设备中存储的相应文件的实际修改时间相一致;Traverse the modification time information corresponding to each file name in the linked list container to determine whether the modification time corresponding to each file name is consistent with the actual modification time of the corresponding file stored in the computing device;
    如果不一致,则确定该文件被修改,获取计算设备中存储的修改后的文件,并基于修改后的文件对所述文件内容索引表进行更新。If not, it is determined that the file is modified, the modified file stored in the computing device is acquired, and the file content index table is updated based on the modified file.
  9. 如权利要求6所述的文件检索方法,其中,如果检索方式为文件名检索方式和文件内容检索方式,则:The file retrieval method according to claim 6, wherein, if the retrieval mode is a file name retrieval mode and a file content retrieval mode, then:
    分别获取文件名索引表、文件内容索引表;Obtain the file name index table and file content index table respectively;
    遍历所述文件名索引表中在当前目录下的索引项,将索引项中的文件名依次与所述检索信息进行比对,确定与所述检索信息相匹配的一个或多个文件名,以生成第一检索结果;Traverse the index entries under the current directory in the file name index table, compare the file names in the index entries with the retrieval information in turn, determine one or more file names that match the retrieval information, and use generating the first search result;
    遍历所述文件内容索引表中在当前目录下的索引项,将索引项中的词元依次与所述检索信息进行比对,确定与所述检索信息相匹配的一个或多个词元,并确定与词元相对应的文件名,以生成第二检索结果;以及Traverse the index entries under the current directory in the file content index table, compare the word elements in the index entries with the retrieval information in turn, determine one or more word elements that match the retrieval information, and determining a filename corresponding to the lemma to generate a second search result; and
    将第一检索结果和第二检索结果返回至客户端。Return the first retrieval result and the second retrieval result to the client.
  10. 如权利要求6所述的文件检索方法,其中,在创建索引表之前,包括步骤:The file retrieval method as claimed in claim 6, wherein, before creating the index table, it comprises the steps of:
    将一种多种类型的文件转换为纯文本格式文件。Convert one or more types of files to plain text format files.
  11. 如权利要求10所述的文件检索方法,其中,将一种或多种格式的文件转换为纯文本格式文件的步骤包括:The file retrieval method of claim 10, wherein the step of converting one or more formats of files into plain text format files comprises:
    获取文件;get the file;
    对文件进行后缀检测,以确定文件类型;Perform suffix detection on the file to determine the file type;
    获取与所述文件类型相对应的解析方法,基于所述解析方法对文件进行解析,以获取所述文件中的纯文本内容。A parsing method corresponding to the file type is acquired, and the file is parsed based on the parsing method to obtain plain text content in the file.
  12. 如权利要求11所述的文件检索方法,其中,还包括步骤:The document retrieval method of claim 11, further comprising the steps of:
    如果对文件后缀检测失败或者对文件解析失败,则对文件进行内容检测,以获取所述文件中的纯文本内容。If the file suffix detection fails or the file parsing fails, content detection is performed on the file to obtain plain text content in the file.
  13. 一种计算设备,包括:A computing device comprising:
    至少一个处理器;以及at least one processor; and
    存储器,存储有程序指令,其中,所述程序指令被配置为适于由所述至少一个处理器执行,所述程序指令包括用于执行如权利要求1-12中任一项所述的方法的指令。a memory storing program instructions, wherein the program instructions are configured to be adapted for execution by the at least one processor, the program instructions comprising means for performing the method of any of claims 1-12 instruction.
  14. 一种存储有程序指令的可读存储介质,当所述程序指令被计算设备读取并执行时,使得所述计算设备执行如权利要求1-12中任一项所述方法。A readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the method according to any one of claims 1-12.
PCT/CN2021/118423 2021-01-05 2021-09-15 File retrieval method and computing device WO2022148055A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110005235.XA CN112328548A (en) 2021-01-05 2021-01-05 File retrieval method and computing device
CN202110005235.X 2021-01-05

Publications (1)

Publication Number Publication Date
WO2022148055A1 true WO2022148055A1 (en) 2022-07-14

Family

ID=74302180

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118423 WO2022148055A1 (en) 2021-01-05 2021-09-15 File retrieval method and computing device

Country Status (2)

Country Link
CN (1) CN112328548A (en)
WO (1) WO2022148055A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573704A (en) * 2024-01-17 2024-02-20 上海合见工业软件集团有限公司 Method, device, equipment and medium for indexing composite document of EDA software

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328548A (en) * 2021-01-05 2021-02-05 统信软件技术有限公司 File retrieval method and computing device
CN113127421A (en) * 2021-04-01 2021-07-16 山东英信计算机技术有限公司 Method and equipment for searching file content in storage system
CN113127519B (en) * 2021-04-30 2023-02-07 平安普惠企业管理有限公司 File query method and device, computer equipment and storage medium
CN113468119A (en) * 2021-05-31 2021-10-01 北京明朝万达科技股份有限公司 File scanning method and device
CN116340268A (en) * 2023-02-28 2023-06-27 上海安博通信息科技有限公司 File traversal method and device and processing equipment
CN117216006A (en) * 2023-11-07 2023-12-12 国网信息通信产业集团有限公司 File content retrieval method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050021512A1 (en) * 2003-07-23 2005-01-27 Helmut Koenig Automatic indexing of digital image archives for content-based, context-sensitive searching
CN101510211A (en) * 2009-03-31 2009-08-19 杭州华三通信技术有限公司 Multimedia data processing system and method
CN102184211A (en) * 2011-05-03 2011-09-14 成都市华为赛门铁克科技有限公司 File system, and method and device for retrieving, writing, modifying or deleting file
CN106776929A (en) * 2016-11-30 2017-05-31 北京锐安科技有限公司 A kind of method for information retrieval and device
CN112328548A (en) * 2021-01-05 2021-02-05 统信软件技术有限公司 File retrieval method and computing device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130246338A1 (en) * 2006-09-15 2013-09-19 Ashok Doddapaneni System and method for indexing a capture system
CN108595489A (en) * 2018-03-15 2018-09-28 北京雷石天地电子技术有限公司 A kind of data retrieval method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050021512A1 (en) * 2003-07-23 2005-01-27 Helmut Koenig Automatic indexing of digital image archives for content-based, context-sensitive searching
CN101510211A (en) * 2009-03-31 2009-08-19 杭州华三通信技术有限公司 Multimedia data processing system and method
CN102184211A (en) * 2011-05-03 2011-09-14 成都市华为赛门铁克科技有限公司 File system, and method and device for retrieving, writing, modifying or deleting file
CN106776929A (en) * 2016-11-30 2017-05-31 北京锐安科技有限公司 A kind of method for information retrieval and device
CN112328548A (en) * 2021-01-05 2021-02-05 统信软件技术有限公司 File retrieval method and computing device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573704A (en) * 2024-01-17 2024-02-20 上海合见工业软件集团有限公司 Method, device, equipment and medium for indexing composite document of EDA software
CN117573704B (en) * 2024-01-17 2024-04-12 上海合见工业软件集团有限公司 Method, device, equipment and medium for indexing composite document of EDA software

Also Published As

Publication number Publication date
CN112328548A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
WO2022148055A1 (en) File retrieval method and computing device
US10853348B2 (en) Multi-user search system with methodology for personalized search query autocomplete
US20080033903A1 (en) Methods and apparatuses for using location information
US10104021B2 (en) Electronic mail data modeling for efficient indexing
WO2021052148A1 (en) Contract sensitive word checking method and apparatus based on artificial intelligence, computer device, and storage medium
US7991767B2 (en) Method for providing a shared search index in a peer to peer network
JP5744216B2 (en) Index and search method based on language locale
CN105988996B (en) Index file generation method and device
US9798776B2 (en) Systems and methods for parsing search queries
WO2008141583A1 (en) Character input method, input system and method for updating word lexicon
WO2008154823A1 (en) Searching method, system and device
US11573961B2 (en) Delta graph traversing system
US20080154882A1 (en) Retrieval apparatus, retrieval method and retrieval program
JP4237813B2 (en) Structured document management system
CN113535642A (en) File searching method and computing device
US20170270127A1 (en) Category-based full-text searching
JP2011090463A (en) Document retrieval system, information processing apparatus, and program
JP4091586B2 (en) Structured document management system, index construction method and program
CN113312540A (en) Information processing method, device, equipment, system and readable storage medium
WO2021042084A1 (en) Systems and methods for retreiving images using natural language description
WO2019223597A1 (en) Method and device for annotation information determination and prefix tree construction
JP4304226B2 (en) Structured document management system, structured document management method and program
JP4160627B2 (en) Structured document management system and program
Sheguri ENHANCING THE QUEUING PROCESS FOR YIOOP'S SCHEDULER
CN114154072A (en) Search method, search device, electronic device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21917098

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE