CN115794745B - File searching method, system, equipment and storage medium - Google Patents

File searching method, system, equipment and storage medium Download PDF

Info

Publication number
CN115794745B
CN115794745B CN202310043240.9A CN202310043240A CN115794745B CN 115794745 B CN115794745 B CN 115794745B CN 202310043240 A CN202310043240 A CN 202310043240A CN 115794745 B CN115794745 B CN 115794745B
Authority
CN
China
Prior art keywords
file
search
class
search keyword
names
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310043240.9A
Other languages
Chinese (zh)
Other versions
CN115794745A (en
Inventor
宋昆鸿
唐盛
李能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Livefan Information Technology Co ltd
Original Assignee
Livefan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Livefan Information Technology Co ltd filed Critical Livefan Information Technology Co ltd
Priority to CN202310043240.9A priority Critical patent/CN115794745B/en
Publication of CN115794745A publication Critical patent/CN115794745A/en
Application granted granted Critical
Publication of CN115794745B publication Critical patent/CN115794745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a file searching method, a system, equipment and a storage medium, wherein the method comprises the steps of receiving a searching keyword input by a user; analyzing the search keywords to obtain element composition forms of the search keywords; generating a file name classification strategy adapted to the element composition form based on the element composition form; scanning all storage files in a file storage area, and generating file indexes of all the storage files based on the file name classification strategy; searching whether file names containing the search keywords exist in the file index in a mode of element-by-element matching; when the file name comprising the search keyword is found, acquiring a file storage address corresponding to the file name; and obtaining the target file based on the file storage address. The file searching method of the invention adopts a mode of generating file indexes based on search keywords to classify file names, reduces the number of file names to be matched and improves the searching efficiency.

Description

File searching method, system, equipment and storage medium
Technical Field
The present invention relates to the field of file searching technologies, and in particular, to a method, a system, an apparatus, and a storage medium for searching files.
Background
With the rapid development of computer age, the computer function is more and more perfect, more data and files can be stored, when too many files are stored in the computer, if a certain file is to be searched, the file can be directly searched in a catalog under the condition of knowing the position of the file, but if only the name is known, but the position is not known, the searching can be performed through the searching function of the computer, the searching is divided into fuzzy searching and full text searching, the fuzzy searching means that only a part of the name is known, all the searched files containing keywords are required to be arranged for selection of a user, and the full text searching means that the complete name of the file is known, and the designated file is accurately searched through the name.
At present, when a computer searches the whole text, a recursion method is basically used, the recursion method searches the subfolders below according to each folder, searches all the time, does not find the subfolders which continue to circulate to the next folder under the folder, and so on, which is the most common searching method, but is characterized by slow speed, and if the folders are too many and subfolders are at a higher level, the searching is time consuming, and the novel method optimizes the file searching speed by a searching algorithm and can be used for various searching scenes on the computer.
Accordingly, the prior art is still in need of improvement and development.
Disclosure of Invention
The main object of the present invention is to provide a method, a system, a device and a storage medium for searching files, so as to solve the above-mentioned problems in the prior art.
The first aspect of the invention discloses a file searching method, which comprises the following steps:
receiving search keywords input by a user;
analyzing the search keywords to obtain element composition forms of the search keywords;
generating a file name classification strategy adapted to the element composition form based on the element composition form;
scanning all storage files in a file storage area, and generating file indexes of all the storage files based on the file name classification strategy;
searching whether file names containing the search keywords exist in the file index in a mode of element-by-element matching;
when the file name comprising the search keyword is found, acquiring a file storage address corresponding to the file name;
obtaining a target file based on the file storage address;
the file name classification strategy comprises the following steps: for file names of all stored files in the file storage area, acquiring the element composition forms of the file names, classifying the file names with the element composition forms identical to the search keywords into a first class, classifying the file names with the element composition forms different from the search keywords into a second class, and performing subdivision at least once in the first class according to the types of all the composition elements of the search keywords.
In an optional implementation manner of the first aspect of the present invention, the parsing the search keyword to obtain an element composition form of the search keyword includes:
traversing each component element of the search keyword, and recording the type of each component element, wherein the type of each component element comprises Chinese characters, letters, numbers and symbols;
and counting the types of the obtained constituent elements to obtain element constituent forms of the search keywords, wherein the element constituent forms comprise pure Chinese characters, pure letters, pure numbers, pure symbols, chinese character letter combinations, chinese character number combinations, chinese character symbol combinations, alphanumeric combinations, letter symbol combinations, chinese character alphanumeric combinations, chinese character letter symbol combinations and letter number symbol combinations.
In an optional implementation manner of the first aspect of the present invention, the at least one subdivision, and in the first class, according to a type of each constituent element of the search keyword, includes:
acquiring the type of the first component element of the search keyword;
judging whether the type of the first component element of the search keyword belongs to the type of a partitionable interval or not, wherein the type of the partitionable interval comprises Chinese characters, letters and numbers;
if the type of the first component element of the search keyword belongs to the type of the partitionable interval, a plurality of subclasses are further partitioned in the first class.
In an optional implementation manner of the first aspect of the present invention, the dividing the first class into a plurality of subclasses further includes:
counting the number of the file names belonging to the first class;
acquiring an expected value of the number of preset classified files;
dividing the number by the expected value and rounding to obtain a dividing value of the subclass;
and dividing a plurality of subclasses in the first class by taking the division value as the division number of the subclasses.
In an optional implementation manner of the first aspect of the present invention, the generating file indexes of all storage files based on the file name classification policy includes:
scanning all storage files in a file storage area to obtain file names of all the storage files;
determining a file classification in the file index based on the file name classification policy;
matching the file names one by one, and obtaining the file classifications corresponding to the file names in the file indexes;
and writing each file name and a file storage address corresponding to each file name into the file index according to the file classification corresponding to each file name in the file index.
In an optional implementation manner of the first aspect of the present invention, the searching whether the file name containing the search keyword exists in the file index in an element-by-element matching manner includes:
determining a minimum target file classification of the search keyword in the file index based on each component element of the search keyword;
element-by-element matching is carried out on each component element of the search keyword and each file name in the minimum target file classification;
when the file names of the respective constituent elements including the search keyword are matched, the file names are regarded as target file names;
and acquiring a file storage address corresponding to the target file name, and acquiring a target file based on the file storage address.
A second aspect of the present invention provides a file search system, the file search system comprising:
the receiving module is used for receiving search keywords input by a user;
the keyword analysis module is used for analyzing the search keywords to obtain element composition forms of the search keywords;
the classification strategy generation module is used for generating a file name classification strategy which is adapted to the element composition form based on the element composition form;
the file scanning and index generating module is used for scanning all storage files in the file storage area and generating file indexes of all the storage files based on the file name classification strategy;
the file name matching module is used for searching whether the file names containing the search keywords exist in the file index in an element-by-element matching mode;
the file storage address acquisition module is used for acquiring a file storage address corresponding to the file name when the file name comprising the search keyword is found;
the target file acquisition module is used for acquiring a target file based on the file storage address;
the file name classification strategy comprises the following steps: for file names of all stored files in the file storage area, acquiring the element composition forms of the file names, classifying the file names with the element composition forms identical to the search keywords into a first class, classifying the file names with the element composition forms different from the search keywords into a second class, and performing subdivision at least once in the first class according to the types of all the composition elements of the search keywords.
A third aspect of the present invention provides a file search apparatus comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line;
the at least one processor invokes the instructions in the memory to cause the file searching apparatus to perform the file searching method according to any one of the preceding claims.
A fourth aspect of the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a file searching method as claimed in any one of the preceding claims.
The beneficial effects are that: the invention provides a file searching method, a system, equipment and a storage medium, wherein the method comprises the steps of receiving a searching keyword input by a user; analyzing the search keywords to obtain element composition forms of the search keywords; generating a file name classification strategy adapted to the element composition form based on the element composition form; scanning all storage files in a file storage area, and generating file indexes of all the storage files based on the file name classification strategy; searching whether file names containing the search keywords exist in the file index in a mode of element-by-element matching; when the file name comprising the search keyword is found, acquiring a file storage address corresponding to the file name; and obtaining the target file based on the file storage address. The file searching method of the invention adopts a mode of generating file indexes based on search keywords to classify file names, reduces the number of file names to be matched and improves the searching efficiency.
Drawings
FIG. 1 is a diagram illustrating an embodiment of a method for searching documents according to the present invention;
FIG. 2 is a diagram illustrating one embodiment of a file search system of the present invention;
fig. 3 is a schematic diagram of an embodiment of a document searching apparatus of the present invention.
Detailed Description
The embodiment of the invention provides a file searching method, a system, equipment and a storage medium. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, the first aspect of the present invention discloses a file searching method, which comprises the following steps:
s100, receiving search keywords input by a user; in the invention, when a user needs to search for files, the names of the files to be searched are input in a preset input box, and because the names of the files are generally longer, the user generally has difficulty in remembering the complete file names when searching the files, and therefore, the user generally inputs search keywords to search;
s200, analyzing the search keywords to obtain element composition forms of the search keywords; in the invention, the search key words can be Chinese characters, letters, numbers, symbols or the combination of one or more of the above constituent elements, and the invention analyzes the element constituent forms of the search key words to find out the file names with matching meaning when classifying and screening the file names of all stored files in the file storage area according to the element constituent forms, and then only matches the file names with matching meaning;
s300, generating a file name classification strategy adapted to the element composition form based on the element composition form; in the invention, after the element composition forms of the search keywords are analyzed, a strategy for classifying the file names of all the storage files in the file storage area is generated according to the element composition forms, and the aim of the strategy is to reduce the number of the file names which need to be matched in the later period;
s400, scanning all storage files in a file storage area, and generating file indexes of all the storage files based on the file name classification strategy; in the invention, the comparison of the search keywords in the file names of the stored files is not directly carried out in the storage area, but the file indexes of all the file names of the stored files are obtained through collection and classification firstly and are carried out in the index files, so that the positions of all the files in the file storage area are changed;
s500, searching whether file names containing the search keywords exist in the file index in a mode of element-by-element matching; in the invention, after the file index classified by file names is obtained, the classification of the search key words in the file index can be roughly judged based on the composition elements of the search key words, and finally the matching of the search key words and the file names of stored files is carried out in the determined classification;
s600, when the file name comprising the search keyword is found, acquiring a file storage address corresponding to the file name; in the file index of the invention, each file name is bound with the hyperlink (namely the file storage address) of the corresponding file, and after the proper file name is obtained by matching, the target file position can be directly reached through the hyperlink of the file name, so that the desired target file is obtained;
s700, obtaining the target file based on the file storage address. In the invention, after matching the proper file name, the search system generates a hyperlink based on the file name under the input box, and clicks the file name to jump to the position of the target file.
In an optional implementation manner of the first aspect of the present invention, the parsing the search keyword to obtain an element composition form of the search keyword includes:
traversing each component element of the search keyword, and recording the type of each component element, wherein the type of each component element comprises Chinese characters, letters, numbers and symbols; in the present invention, each chinese character, each letter, each number and each symbol of each search keyword are called as constituent elements, and the traversal is to analyze one constituent element by one constituent element in a left-to-right manner, for example, if it is determined that the first constituent element is a chinese character, record 1: and (3) Chinese characters, wherein if the second constituent element is judged to be a number, recording 2: the numbers, the latter constituent elements are all subspecies.
And counting the types of the obtained constituent elements to obtain element constituent forms of the search keywords, wherein the element constituent forms comprise pure Chinese characters, pure letters, pure numbers, pure symbols, chinese character letter combinations, chinese character number combinations, chinese character symbol combinations, alphanumeric combinations, letter symbol combinations, chinese character alphanumeric combinations, chinese character letter symbol combinations and letter number symbol combinations. In the invention, after the types of all the constituent elements are collected in the last step, the types of all the constituent elements are counted to obtain element constituent forms of the search keywords, and all file names are classified according to the element constituent forms of the search keywords in the later classification.
In an optional implementation manner of the first aspect of the present invention, the file name classification policy includes:
for file names of all stored files in a file storage area, acquiring element composition forms of the file names, classifying the file names with the element composition forms identical to the search keywords into a first class, classifying the file names with the element composition forms different from the search keywords into a second class, and performing subdivision at least once in the first class according to types of all composition elements of the search keywords. In the invention, when classifying the file names of the stored files in the file storage area, the file names are classified according to whether the element composition forms of the file names are the same as the search keywords, for example, after the element composition forms of the search keywords are Chinese character letter symbol combinations, the file names of the file names which are also Chinese character letter symbol combinations are also classified into one type and recorded into the file index, the titles in the file index can be classified by the name functions of the element composition forms, the file names which are different in other element composition forms are classified into another type, and further classification is needed because the file names which are in line with the element composition forms after classification can be more, the further classification is further performed according to the type of the first element composition of the search keywords, and part of the file names are matched with the number of classification samples, so that the matching efficiency is improved.
In an optional implementation manner of the first aspect of the present invention, the at least one subdivision, and in the first class, according to a type of each constituent element of the search keyword, includes:
acquiring the type of the first component element of the search keyword; in the invention, after the first class and the second class which need to be divided have been determined, since file name data of the first class may be huge, at least one subdivision of a matching sample is needed for the first class, and the subdivision is mainly based on the types of the first component elements;
judging whether the type of the first component element of the search keyword belongs to the type of a partitionable interval or not, wherein the type of the partitionable interval comprises Chinese characters, letters and numbers; in the present invention, before further subdividing the filename sample in the first class, it is necessary to determine whether the first class belongs to a type that can be subdivided, if it is a kanji (first letter converted into pinyin) and an alphabetic letter, a (a) -Z (Z) are sequential, and numbers (0-9) are also sequential, so that a (a) -Z (Z) and (0-9) can be divided into and multiple segments in sequence, and if it is a symbol, there is no rule of subdivision at the time of subdivision because of no rule, and if it is a situation defined as an undivided interval;
if the type of the first component element of the search keyword belongs to the type of the partitionable interval, a plurality of subclasses are further partitioned in the first class. In the present invention, if the type of the first component element of the search keyword belongs to the type of the partitionable interval, subclass classification is further performed, taking the type of the first component element of the search keyword as a number (0-9) as an example, for example, the classification of the number 0-3 into one class, the classification of 4-7 into one class, and the classification of 8 and 9 into one class are continued in the first class, and the main purpose of this step is to further reduce the number of file names to be matched.
In an optional implementation manner of the first aspect of the present invention, the dividing the first class into a plurality of subclasses further includes:
counting the number of the file names belonging to the first class; in the invention, when the subclasses are divided, in order to ensure that the number of file names of each subclass after the subclasses are divided meets the requirement of quickly obtaining a matching result and is more uniform, the subclasses are divided according to a certain method, and the division basis is based on the number of the file names belonging to the first class;
acquiring an expected value of the number of preset classified files; in the invention, in order to improve the matching speed, the number of the matched files has an ideal value (also called an expected value), and on the basis of the expected value, the result can be obtained relatively quickly whether the same file name can be found or not can be found;
dividing the number by the expected value and rounding to obtain a dividing value of the subclass; in the invention, after the expected value of the preset classified file number is obtained, dividing the number of the file names belonging to the first class by the expected value of the preset classified file number to obtain the dividing value of the subclass, wherein when rounding, rounding is preferably carried out in a direction with larger vector value to obtain the number of the file names of each class;
and dividing a plurality of subclasses in the first class by taking the division value as the division number of the subclasses. In the present invention, if the division value is 4, the first class is divided into four subclasses, taking the number 0-9 as an example, the division into 4 may be 0-2 as one subclass, 3-5 bits as one subclass, 6-8 as one subclass, and 9 as one subclass.
In an optional implementation manner of the first aspect of the present invention, the generating file indexes of all storage files based on the file name classification policy includes:
scanning all storage files in a file storage area to obtain file names of all the storage files; in the invention, after obtaining the file name classification strategy, in order to classify the file names of all the stored files according to the file name classification strategy, firstly, the file names of all the stored files are obtained by scanning;
determining a file classification in the file index based on the file name classification policy; in the step, file classification in the file index is determined according to the first class and the second class in a way of subdividing the first class, and the number of file names in each class is needed to be based on when the files are classified;
matching the file names one by one, and obtaining the file classifications corresponding to the file names in the file indexes; after determining the file classification in the final file index in the previous step, determining the file classification corresponding to each file name in the file index by taking the file classification as a basis;
and writing each file name and a file storage address corresponding to each file name into the file index according to the file classification corresponding to each file name in the file index. In the invention, after the file classification in the file index is determined, the file names are read one by one, which specific classification the file name belongs to is judged, and the file names are written into the corresponding file classification.
In an optional implementation manner of the first aspect of the present invention, the searching whether the file name containing the search keyword exists in the file index in an element-by-element matching manner includes:
determining a minimum target file classification of the search keyword in the file index based on each component element of the search keyword; in the invention, as the first class and the second class are formed according to the element composition of the search keyword in the file index, the method directly enters the first class for searching when the search keyword is matched with the file name, after entering the first class, the first composition element of the search keyword is obtained, which subclass is specifically in is determined based on the first composition element, and then the matching of each composition element is carried out in the corresponding subclass;
element-by-element matching is carried out on each component element of the search keyword and each file name in the minimum target file classification; in the invention, the matching mode is that one component element is firstly taken out from the search keyword according to the sequence from left to right, then the component elements at the corresponding positions are taken out from the file names, and the component elements are compared one by one;
when the file names of the respective constituent elements including the search keyword are matched, the file names are regarded as target file names; in the invention, after all the constituent elements of the search keywords are compared, if one file name contains the search keywords, the file name is used as a target file name corresponding to the search keywords;
and acquiring a file storage address corresponding to the target file name, and acquiring a target file based on the file storage address. In the invention, the file storage address is linked under the target file name, so that the step is to feed back the target file name to the user, the user can automatically jump to the target position of the target file by clicking the target file name, and the whole searching process can be completed by manually finding the target file in the target position.
Referring to fig. 2, a second aspect of the present invention provides a file search system including:
a receiving module 10, configured to receive a search keyword input by a user;
a keyword parsing module 20, configured to parse the search keyword to obtain an element composition form of the search keyword;
a classification policy generation module 30, configured to generate a file name classification policy adapted to the element composition form based on the element composition form;
a file scanning and index generating module 40, configured to scan all storage files in a file storage area, and generate file indexes of all the storage files based on the file name classification policy;
a file name matching module 50, configured to find whether a file name containing the search keyword exists in the file index in a manner of element-by-element matching;
a file storage address obtaining module 60, configured to obtain a file storage address corresponding to a file name that includes the search keyword when the file name is found;
the target file obtaining module 70 is configured to obtain a target file based on the file storage address.
In an alternative embodiment of the second aspect of the present invention, the keyword parsing module 20 includes:
an element traversing unit, configured to traverse each component element of the search keyword, and record types of each component element, where the types of component elements include Chinese characters, letters, numbers, and symbols;
and the type statistics unit is used for counting the types of the obtained constituent elements to obtain element constituent forms of the search keywords, wherein the element constituent forms comprise pure Chinese characters, pure letters, pure numbers, pure symbols, chinese character letter combinations, chinese character number combinations, chinese character symbol combinations, alphanumeric combinations, letter symbol combinations, chinese character alphanumeric combinations, chinese character letter symbol combinations and letter number symbol combinations.
In an optional embodiment of the second aspect of the present invention, the file name classification policy includes:
for file names of all stored files in a file storage area, acquiring element composition forms of the file names, classifying the file names with the element composition forms identical to the search keywords into a first class, classifying the file names with the element composition forms different from the search keywords into a second class, and performing subdivision at least once in the first class according to types of all composition elements of the search keywords.
In an optional embodiment of the second aspect of the present invention, the at least one subdivision, and in the first category, according to a type of each constituent element of the search keyword, includes:
acquiring the type of the first component element of the search keyword;
judging whether the type of the first component element of the search keyword belongs to the type of a partitionable interval or not, wherein the type of the partitionable interval comprises Chinese characters, letters and numbers;
if the type of the first component element of the search keyword belongs to the type of the partitionable interval, a plurality of subclasses are further partitioned in the first class.
In an optional embodiment of the second aspect of the present invention, the dividing the first class into a plurality of subclasses further includes:
counting the number of the file names belonging to the first class;
acquiring an expected value of the number of preset classified files;
dividing the number by the expected value and rounding to obtain a dividing value of the subclass;
and dividing a plurality of subclasses in the first class by taking the division value as the division number of the subclasses.
In an alternative embodiment of the second aspect of the present invention, the file scanning and index generating module 40 includes:
the file scanning unit is used for scanning all the storage files in the file storage area and acquiring file names of all the storage files;
a classification determining unit configured to determine a file classification in the file index based on the file name classification policy;
a file name classification matching unit, configured to match the file names one by one, and obtain the file classifications corresponding to the file names in the file indexes;
and the file name writing unit is used for writing each file name and a file storage address corresponding to each file name into the file index according to the file classification corresponding to each file name in the file index.
In an alternative embodiment of the second aspect of the present invention, the file name matching module 50 includes:
a minimum classification acquisition unit, configured to determine a minimum target file classification of the search keyword in the file index based on each constituent element of the search keyword;
a file name matching unit, configured to match each component element of the search keyword with each file name in the minimum target file classification element by element;
a file name acquisition unit configured to, when the file names of the respective constituent elements including the search keyword are matched, take the file names as target file names.
Fig. 3 is a schematic diagram of a file searching apparatus according to an embodiment of the present invention, which may vary considerably in configuration or performance, and may include one or more processors 80 (central processing units, CPU) (e.g., one or more processors) and a memory 90, one or more storage media 100 (e.g., one or more mass storage devices) storing application programs or data. The memory and storage medium may be transitory or persistent. The program stored on the storage medium may include one or more modules (not shown), each of which may include a series of instruction operations in the file searching apparatus. Still further, the processor may be configured to communicate with a storage medium and execute a series of instruction operations in the storage medium on the file searching apparatus.
The file search apparatus of the present invention may also include one or more power supplies 110, one or more wired or wireless network interfaces 120, one or more input/output interfaces 130, and/or one or more operating systems, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the document searching apparatus structure shown in fig. 3 does not constitute a particular limitation of the document searching apparatus of the present invention, and may include more or less components than those illustrated, or may combine certain components, or may be a different arrangement of components.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, or may be a volatile computer readable storage medium, having stored therein instructions that, when executed on a computer, cause the computer to perform the steps of the file searching method.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system or the unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A method of searching for a file, comprising the steps of:
receiving search keywords input by a user;
analyzing the search keywords to obtain element composition forms of the search keywords;
generating a file name classification strategy adapted to the element composition form based on the element composition form;
scanning all storage files in a file storage area, and generating file indexes of all the storage files based on the file name classification strategy;
searching whether file names containing the search keywords exist in the file index in a mode of element-by-element matching;
when the file name comprising the search keyword is found, acquiring a file storage address corresponding to the file name;
obtaining a target file based on the file storage address;
the file name classification strategy comprises the following steps: for file names of all stored files in the file storage area, acquiring the element composition forms of the file names, classifying the file names with the element composition forms identical to the search keywords into a first class, classifying the file names with the element composition forms different from the search keywords into a second class, and performing subdivision at least once in the first class according to the types of all the composition elements of the search keywords; the parsing the search keyword to obtain the element composition form of the search keyword includes:
traversing each component element of the search keyword, and recording the type of each component element, wherein the type of each component element comprises Chinese characters, letters, numbers and symbols;
counting the types of the obtained constituent elements to obtain element constituent forms of the search keywords, wherein the element constituent forms comprise pure Chinese characters, pure letters, pure numbers, pure symbols, chinese character letter combinations, chinese character number combinations, chinese character symbol combinations, alphanumeric combinations, letter symbol combinations, chinese character alphanumeric combinations, chinese character letter symbol combinations and letter number symbol combinations;
the and at least one subdivision in the first category according to the type of each component element of the search keyword comprises:
acquiring the type of the first component element of the search keyword;
judging whether the type of the first component element of the search keyword belongs to the type of a partitionable interval or not, wherein the type of the partitionable interval comprises Chinese characters, letters and numbers;
if the type of the first component element of the search keyword belongs to the type of the partitionable interval, a plurality of subclasses are further partitioned in the first class;
said further partitioning of the first class into a plurality of subclasses includes:
counting the number of the file names belonging to the first class;
acquiring an expected value of the number of preset classified files;
dividing the number by the expected value and rounding to obtain a dividing value of the subclass;
and dividing a plurality of subclasses in the first class by taking the division value as the division number of the subclasses.
2. The file searching method of claim 1, wherein scanning all stored files in a file storage area, generating file indexes of all the stored files based on the file name classification policy comprises:
scanning all storage files in a file storage area to obtain file names of all the storage files;
determining a file classification in the file index based on the file name classification policy;
matching the file names one by one, and obtaining the file classifications corresponding to the file names in the file indexes;
and writing each file name and a file storage address corresponding to each file name into the file index according to the file classification corresponding to each file name in the file index.
3. The method according to claim 1, wherein said searching for whether there is a file name containing the search keyword in the file index by element-by-element matching includes:
determining a minimum target file classification of the search keyword in the file index based on each component element of the search keyword;
element-by-element matching is carried out on each component element of the search keyword and each file name in the minimum target file classification;
when the file names of the respective constituent elements including the search keyword are matched, the file names are regarded as target file names.
4. A file search system, the file search system comprising:
the receiving module is used for receiving search keywords input by a user;
the keyword analysis module is used for analyzing the search keywords to obtain element composition forms of the search keywords;
the classification strategy generation module is used for generating a file name classification strategy which is adapted to the element composition form based on the element composition form;
the file scanning and index generating module is used for scanning all storage files in the file storage area and generating file indexes of all the storage files based on the file name classification strategy;
the file name matching module is used for searching whether the file names containing the search keywords exist in the file index in an element-by-element matching mode;
the file storage address acquisition module is used for acquiring a file storage address corresponding to the file name when the file name comprising the search keyword is found;
the target file acquisition module is used for acquiring a target file based on the file storage address;
the file name classification strategy comprises the following steps: for file names of all stored files in the file storage area, acquiring the element composition forms of the file names, classifying the file names with the element composition forms identical to the search keywords into a first class, classifying the file names with the element composition forms different from the search keywords into a second class, and performing subdivision at least once in the first class according to the types of all the composition elements of the search keywords;
the keyword parsing module comprises:
an element traversing unit, configured to traverse each component element of the search keyword, and record types of each component element, where the types of component elements include Chinese characters, letters, numbers, and symbols;
a type statistics unit, configured to count the types of the obtained constituent elements, and obtain element constituent forms of the search keyword, where the element constituent forms include pure Chinese characters, pure letters, pure numbers, pure symbols, chinese character letter combinations, chinese character number combinations, chinese character symbol combinations, alphanumeric combinations, letter symbol combinations, chinese character alphanumeric combinations, chinese character letter symbol combinations, and letter number symbol combinations;
the and at least one subdivision in the first category according to the type of each component element of the search keyword comprises:
acquiring the type of the first component element of the search keyword;
judging whether the type of the first component element of the search keyword belongs to the type of a partitionable interval or not, wherein the type of the partitionable interval comprises Chinese characters, letters and numbers;
if the type of the first component element of the search keyword belongs to the type of the partitionable interval, a plurality of subclasses are further partitioned in the first class;
said further partitioning of the first class into a plurality of subclasses includes:
counting the number of the file names belonging to the first class;
acquiring an expected value of the number of preset classified files;
dividing the number by the expected value and rounding to obtain a dividing value of the subclass;
and dividing a plurality of subclasses in the first class by taking the division value as the division number of the subclasses.
5. A document searching apparatus, characterized in that the document searching apparatus comprises: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line;
the at least one processor invoking the instructions in the memory to cause the file searching apparatus to perform the file searching method of any of claims 1-3.
6. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a file searching method according to any of claims 1-3.
CN202310043240.9A 2023-01-29 2023-01-29 File searching method, system, equipment and storage medium Active CN115794745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310043240.9A CN115794745B (en) 2023-01-29 2023-01-29 File searching method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310043240.9A CN115794745B (en) 2023-01-29 2023-01-29 File searching method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115794745A CN115794745A (en) 2023-03-14
CN115794745B true CN115794745B (en) 2023-07-18

Family

ID=85429049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310043240.9A Active CN115794745B (en) 2023-01-29 2023-01-29 File searching method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115794745B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116482521B (en) * 2023-06-25 2023-10-20 江西兆驰半导体有限公司 Chip testing method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153791A (en) * 2021-10-14 2022-03-08 北京鸿合爱学教育科技有限公司 File fast retrieval method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236706B (en) * 2011-06-17 2012-12-05 浙江大学 Fast fuzzy pinyin inquiry method of mass Chinese file names
JP5737079B2 (en) * 2011-08-31 2015-06-17 カシオ計算機株式会社 Text search device, text search program, and text search method
CN102999601A (en) * 2012-11-20 2013-03-27 广东欧珀移动通信有限公司 Method for sorting files, and multimedia terminal
CN105279278B (en) * 2015-11-13 2019-03-12 珠海豹趣科技有限公司 The searching method and device of file
US11586586B2 (en) * 2019-06-03 2023-02-21 EMC IP Holding Company LLC Indexes and queries for files by indexing file directories
CN115145871A (en) * 2022-07-01 2022-10-04 中国银行股份有限公司 File query method and device and electronic equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153791A (en) * 2021-10-14 2022-03-08 北京鸿合爱学教育科技有限公司 File fast retrieval method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115794745A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
EP1585073B1 (en) Method for duplicate detection and suppression
US8533203B2 (en) Identifying synonyms of entities using a document collection
US6240409B1 (en) Method and apparatus for detecting and summarizing document similarity within large document sets
US6757675B2 (en) Method and apparatus for indexing document content and content comparison with World Wide Web search service
US20050021545A1 (en) Very-large-scale automatic categorizer for Web content
US20020169770A1 (en) Apparatus and method that categorize a collection of documents into a hierarchy of categories that are defined by the collection of documents
US20060179051A1 (en) Methods and apparatus for steering the analyses of collections of documents
Zhai et al. ATLAS: a probabilistic algorithm for high dimensional similarity search
US8510312B1 (en) Automatic metadata identification
EP2631815A1 (en) Method and device for ordering search results, method and device for providing information
WO2000007094A9 (en) Method and apparatus for digitally shredding similar documents within large document sets in a data processing environment
CN112579155A (en) Code similarity detection method and device and storage medium
CN115794745B (en) File searching method, system, equipment and storage medium
JP4630911B2 (en) Document classification apparatus, document classification method, and computer-readable recording medium storing a program for causing a computer to execute the methods
CN115145871A (en) File query method and device and electronic equipment
US20040186833A1 (en) Requirements -based knowledge discovery for technology management
Mic et al. CRANBERRY: memory-effective search in 100M high-dimensional CLIP vectors
CN110955845A (en) User interest identification method and device, and search result processing method and device
Prieto et al. Extracting descriptive words from untranscribed handwritten images
CN111259145A (en) Text retrieval classification method, system and storage medium based on intelligence data
Elsayed SVM transformations for Multi-labeled Topics
Magapu Development and customization of in-house developed OCR and its evaluation
Hussein Searching large-scale image collections
Wirasandi et al. Office Document Search Engine
Howarth et al. Trading precision for speed: localised similarity functions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant