CN1360267A - Sorting and searching method for files - Google Patents

Sorting and searching method for files Download PDF

Info

Publication number
CN1360267A
CN1360267A CN 02100839 CN02100839A CN1360267A CN 1360267 A CN1360267 A CN 1360267A CN 02100839 CN02100839 CN 02100839 CN 02100839 A CN02100839 A CN 02100839A CN 1360267 A CN1360267 A CN 1360267A
Authority
CN
China
Prior art keywords
file
inquiry
files
classification
extension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 02100839
Other languages
Chinese (zh)
Inventor
陈华
李晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN 02100839 priority Critical patent/CN1360267A/en
Publication of CN1360267A publication Critical patent/CN1360267A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A class search method for files is characterized by that the directory-based class search of files can merge the file name with the upper-layer directory for making it possible to search both subject-orientated and contents-orientated files, and the file search methods based on file extension name and inquiry frequency are combined for higher search efficiency and correctness.

Description

Sorting and searching method for files
Technical field: the present invention relates to information retrieval technique, what be primarily aimed at is file polling.
Background technology: present file polling technology has been imitated the webpage inquiring technology to a great extent, but there are difference in essence in file polling and webpage inquiry.Webpage inquiry can analyzing web page content, and file polling unlikely downloads to this locality to each telefile and carries out content analysis; Even can download to this locality, because the diversity of file layout, also be difficult to analyze wherein content for nondocument.Therefore filename and file attribute that unique data that can be used for file polling are files comprise that at present the file polling system of Ftp search engine and Windows ff has all only realized at the inquiry of filename and the filtration of file attribute.But the inquiry mode of this object oriented file name can not provide the search towards particular topic, can not excavate the file that filename can't be represented file content, and the user is being required too much aspect the understanding query aim.Address these problems, some special-purpose search engine provides some solutions.Analyze information such as the author of each mp3 file and title such as Napster website music mp3 file search engine, classify in view of the above and inquiry to content is provided.But this technology must read (or download) each file to analyze content wherein, and simultaneity factor must provide independent data handling procedure to the file type that all inquiries need to support.Therefore the cost of this method is extremely high, and downloads All Files and analyze and make that whole data collection process is very very long, has influenced the inquiry effect, and the manageable file type of this method is also very limited simultaneously.
Summary of the invention: we provide a solution for this reason, and purpose is to realize the classification of file and to the inquiry of content under the prerequisite of Study document content that do not open file, and makes inquiry system oversimplify.This inquiring technology is replenishing the file polling system that has been shaped, purpose is to utilize classification feature to improve the recall ratio and the precision ratio of file polling, remedy the defective of traditional file inquiring technology, realize subject-oriented, towards the inquiry of content, excavate the hiding data in the common inquiry system, propose the new application mode of file polling.
Content of the present invention and technical scheme are as follows:
Ff method of the present invention comprises three kinds of document classification querying methods based on file extension, catalogue and enquiry frequency.They have been constituted the complete ff technology based on classification together.
1, inquires by classification based on the file layout of filename
The type distribution of the matched character string that is used to inquire about for analysis user, we have added up the coupling string of user's input of 840,000 times of FTP search engine, obtain match query string type distribution plan Fig. 1.I represents single key word type ratio among the figure, and II represents only extension name type ratio, and III represents full filename type ratio.As seen from Figure 1, all be only to import a key word during most user inquiring, and concrete extension name can't be provided.For domestic consumer, extension name is a more indigestible thing, movie file for example, possible expansion is called " .rm ", " .mpeg ", " .dat " or the like, requires the user to provide extension name can make domestic consumer that inquiry system is hung back in order to search film.But, the user extension name is not provided and in entire database inquiry just have and much do not meet the Query Result that the user needs, really obtained the source code download address of this program such as the download address of certain program of inquiry, thereby made precision ratio not high.Thereby what may need the common user query file time is certain type file, rather than the file of particular extension, and for example the user may wish to inquire music file, is " .mp3 " file or " .au " file but do not limit.Even the user knows under the situation of extension name,, be necessary for this first song and specify a plurality of extension name, otherwise just may miss many download address, and this is often pretty troublesome, also is not easy in the realization in order to find all download address of a first song.
In order to solve the memory extension name to the burden of domestic consumer and be implemented in a file polling in the big classification, All Files can be divided into several simple file layout types, only need specify the file layout type that he needs during user inquiring and need not specify concrete extension name just can inquire about.The file layout type can be divided into image, sound, video, compression, document, program, source code, catalogue and " other " or the like several big classifications by general knowledge.Inquiry system is given each file layout classification numbering, and definition belongs to " the famous extension name " of this classification in a large number.Because the difference of file layout is the extension name of file, inquiry system can not be opened each file and detect its actual file layout, so use " famous extension name " standard as the file layout classification." famous extension name " derives from and popular this extension name belonged to the common recognition of what type file, should belong to the file of Doctype such as " .doc ", " .ppt ", " .txt ", " .pdf ".If certain file has used " .doc " as extension name, but its file layout is not generally accepted " .doc " form, and this situation is not considered in system.Belong to the situation of plurality of classes for a kind of extension name, get its modal classification.When inquiry system obtains a file entries, utilize its extension name to obtain its corresponding file form classification, be kept in the attribute of file entries.When the file of user inquiring specified file format type, just the type number that can select with the user and the type number in the file attribute are made comparisons, and obtaining filtering out the result filename coupling from the filename coupling is again simultaneously the Query Result of specifying the class file format type.By file layout classification synoptic diagram such as Fig. 2, I represents by the partial document before the file layout classification among the figure, and II represents three class files that are divided into by after the file layout classification; Music, video and document.
File layout classified inquiry method based on file extension is exactly to utilize the standard of the extension name of filename as document classification, and file is divided into various Format Type, every kind of corresponding some extension name of Format Type.Wherein the file layout type comprises types such as document, video, audio frequency, image, program, source code, catalogue.The type of generally understanding for the employing of certain extension name institute respective file type; Belong to the situation of plurality of classes for a kind of extension name, get its modal classification.When inquiry system obtains a file entries, utilize its extension name to obtain its corresponding file form classification, be kept in the attribute of file entries; When the file of user inquiring specified file format type, type number of selecting with the user and the type number in the file entries attribute are made comparisons; From the result that the filename coupling obtains, filter out the Query Result that filename coupling while specified file format type is also mated.
2, inquire by classification based on the file content of catalogue
In the inquiry system at filename, each file is to analyze its file content owing to unlikely read (or download), and all analyses to file content can only rely on filename.Though general filename can both embody the content of file, we find that the filename of a lot of multimedia files (referring to video, audio frequency and image file) all can not embody its file content.For video file, often situation about occurring is that filename is a.rm, b.rm rather than concrete " movie name .rm ".For audio file, on the one hand may file be filename and do not comprise singer's name, but may two for user inquiring all need with the title of the song, are very common inquiry behaviors because inquire about all songs of a singer; Similar with the movie file name on the other hand, be exactly that the CD dish audio frequency that changes record may use track0.mp3 track1.mp3 or the like name, and this name can't be determined the content of music at all.For image file, usually the situation of Chu Xianing is with numeral name image file, 1.jPg for example,, 2.jpg or the like, this is because image usually is a form with a series image to be occurred, and all giving one for numerous similar images, the name of independent meaning is arranged is the thing that very bothers.We are to 8,642, and the filename of 123 multimedia files is analyzed, and obtain multimedia file name property list table [1].By table [1] as seen, this filename characteristic of multimedia file has hindered normal multimedia file inquiry.
Table [1]
By the file layout classification type Account for all files ratio Filename can't embody the ratio of file content
Video file ????1.08% ????7.35%
Audio file ????14.99% ????2.17%
Image file ????8.16% ????6.73%
Solve the problem that filename can't embody file content, at first have a look at the effect of file system directories.Why most of operating system adopts the bibliographic structure of tree type to be because tree type contents can be realized powerful classification capacity, and the directory name of each catalogue has embodied the content or the association attributes of file and sub-directory under this catalogue.Especially in the catalogue that comprises the multimedia file with above-mentioned situation, its directory name has generally just embodied the file content of multimedia file in this catalogue.Even the last layer catalogue of multimedia file generally can embody the content of these files, but, when the user inquiring key word, his resulting the possibility of result much is a directory name, and the user must catalogue of a catalogue enter to check whether the file of just knowing the inside is that the user is required really.This operates the fast query ability of just having offset search engine slowly.How to make the user need not enter each catalogue and just can determine that whether the inside is his required file really? the way that addresses this problem is that the directory name of multimedia file place catalogue and filename are come along the matching inquiry string, and the problems referred to above just can solve easily like this.The file layout type numbering that file layout classification above utilizing produces, the directory name that will have the filename of file entries of audio frequency, video and image type and the last layer catalogue that it may exist merges is used as an integral body, inquiry system is all done it as a whole use when setting up index, user inquiring and result and show, also must guarantee the correctness of link in the time of last certainly output download link.Use file layout and file content classification synoptic diagram as shown in Figure 3.I represents by ftp listed files before file layout and the file content classification among the figure, listed files behind the II presentation class, and wherein video, its filename of audio file and its last layer catalogue merge, and file path is stored in addition as file attribute.
Classified inquiry is the filename of file and its last layer catalogue to be merged to do the as a whole inquiry that is used for when inquiry based on the file content of catalogue, and hit results or filename have hit the coupling string, or its last layer directory name has hit the coupling string.This file polling method is used for the inquiry of multimedia file, comprises multimedia file types such as audio frequency, video and image; When the user searches this class multimedia file, adopt file content classified inquiry based on catalogue, the directory name and the filename of file place catalogue come along the matching inquiry string.Wherein the filename of file and its last layer catalogue merged during computer inquery and do the as a whole inquiry that is used for, inquiry system is being set up index, all filename and its last layer catalogue is being done as a whole carrying out when the result shows.
3, inquire about based on the document classification of enquiry frequency
For the naive user of not searching for general knowledge, they often use the bad searching request that can't return information needed, but they have accounted for netizen's the overwhelming majority, and this situation changes never.Through to the log analysis of user inquiring, available conclusion is that most of user is: I can not express me and want what is looked for, but I will know that I look for when I see it is exactly it.If search engine only provides the list of an input frame and a lot of complexity may be at a loss for domestic consumer.Because the keyword scope that it is exactly user search that the FTP search engine has a characteristic is more limited, in more than 90,000 inquiries of our statistics, it is mutually different having only more than 5000 inquiry.If shortcut is made in popular inquiry, the user is once clicking the Query Result that can obtain this software, and what then the user will do to search engine is to indicate oneself what is wanted no longer just, but search engine tells what the user can want.
The definition shortcut refers to indicate a URL link that inquiry is corresponding with a name.After search engine had had file layout classification feature and file content classification, the shortcut system that sets up inquiry was just feasible.This is because in shortcut, makes full use of file layout classification capacity and file content classification capacity, and the Query Result of shortcut can be very accurate and comprehensive.
When shortcut increases, can make and find a shortcut extremely to bother if all shortcuts all offer the user, thereby must classify shortcut.The inquiry class categories of formulating a two-stage is more appropriate, and first order classification is similar to the classification of file layout classification, for example: film, music, program, document etc.; The second level is categorized as the classification by content in this classification, such as action, love type etc. are arranged under the film, system, compression, recreation etc. is arranged under the program.After setting up the shortcut system of this two-stage, by user and keeper in each classification, add enquiry frequency than higher inquiry as shortcut.Utilize cgi script to write down the number of clicks of each shortcut, by the output of clicks descending, then the user can know the software seniority among brothers and sisters of current this classification when showing all shortcut clauses and subclauses of a classification.Shortcut under the part classification is defaulted as a specific file layout, is defaulted as the video file format type, so just can automatically shortcut be combined with the document classification function, guarantee the accuracy of shortcut such as the shortcut of movies category.Shortcut system logic synoptic diagram such as Fig. 4.1 expression shows the shortcut tabulation among the figure, 2 expressions show the shortcut in the classification, 3 expressions are inquired about by the inquiry URL of shortcut correspondence, 4 expression users register new shortcut, the shortcut of 5 expression keeper filter user registrations, the shortcut that 6 expression Admin Administrations have existed, 7 is the shortcut database.
In the document classification inquiry based on enquiry frequency, inquiry URL commonly used is carried out two-stage classification, the first order is categorized as the file layout classification, and the second level is categorized as the classification by content in this classification.Can utilize simultaneously program to write down the number of clicks of each shortcut, when showing all shortcut clauses and subclauses of a classification,, provide such other inquiry seniority among brothers and sisters thus simultaneously by clicks ordering output.
More than in 3 kinds of methods, second kind of file content classified inquiry based on catalogue can be used in combination separately or with other two kinds of methods, is used for searching of multimedia file: promptly can inquire about according to classifying based on the file content of catalogue; When multimedia type file is searched in user's appointment, by the file of inquiry system inquiry file name or file place upper directory name matching inquiry key word.
Other two kinds of querying methods are inquired about and can be used in combination based on the file layout classified inquiry of filename and based on the document classification of enquiry frequency: the user can be according to the file layout classification based on file extension, two search requests of import file name key word and file layout are exported the file that meets these two requirements by the inquiry system coupling; And can according to the listed files of often searching that inquiry system is provided, select needed file in of all categories according to inquiry sorting technique based on enquiry frequency according to the enquiry frequency arrangement.
The Figure of description explanation:
Fig. 1: the type distribution figure of match query string
Fig. 2: based on the file layout classification synoptic diagram of filename
Fig. 3: use file layout and file content classification synoptic diagram
Fig. 4: based on the document classification inquiry synoptic diagram of enquiry frequency
Fig. 5: based on the file layout classified inquiry example of filename
Fig. 6: press file content classified inquiry example
Fig. 7: the shortcut two-stage classification page
Fig. 8: the shortcut in Fig. 7 particular category
Embodiment:
Be described further below in conjunction with embodiment.
Peking University's computer science and technology is network and the compartment system field project since " day net " FTP search engine in 1999.At present Beijing University's " day net " FTP search engine has been one and has collected more than 3000 website in the whole nation, 13,000,000 ftp file entry data are arranged, used the powerful FTP search engine of the technology of searching based on the document classification of filename, catalogue and enquiry frequency.About 200 milliseconds, every day, inquiry times reached about 100,000, and this numeral constantly rises during at present average enquiry fee.
1. inquire design sketch by classification based on the file layout of filename
In the inquiry of Fig. 5, the user has only imported key word " Lu xun ", and has selected to inquire about in Doctype, and Query Result has returned the various format files (.txt and .doc and .htm) that comprise " Lu xun " in the filename.Be that the user need not to specify specific extension name just can inquire about in particular type to obtain his desired result.If the user does not have specified type, then Query Result may much not be that the user is required, and the user must page turning check the file that just can find particular type, and precision ratio is just not high yet.In last example, actually or the user often and be indifferent to file .txt form .doc form, provide extension name if rely on the user, may just can't comprise the file of all similar contents.
2. inquire by classification based on the file content of catalogue
In the inquiry of Fig. 6, user entered keyword " Tokyo Love Story ", and filename does not mostly comprise " Tokyo Love Story " in the result who returns, but tls0? .rm, be that its filename can't embody file content, just because its last layer directory name has comprised " Tokyo Love Story ", under file content classified inquiry based on catalogue, the file that these filenames can't embody file content is able to be found by people, otherwise, the user may only see that some comprise the catalogue of " Tokyo Love Story ", can know just whether the file in this catalogue is required after must entering corresponding catalogue.
3. inquire about based on the document classification of enquiry frequency
Fig. 7 and Fig. 8 two figure are respectively the classification page and the interior shortcut page of certain classification (" swordsman " class of " film, cartoon " lining) in the inquiry classification.The classification page conveniently finds the shortcut of particular category, shows the inquiry that some is commonly used in the shortcut page, and the user only just need click can obtain Query Result, and need not any input.
Advantage of the present invention and good effect are:
Inquiring technology with existing object oriented file name is compared, and has following advantage and good effect based on the document classification technology of searching of filename, catalogue and enquiry frequency:
1. the precision ratio of file polling system improves greatly.After the file layout classified inquiry technology of application based on filename, a general medium file search engine has become a plurality of topic search engines.The user can be in various specified type locating file and needn't lie in its extension name.Especially when the Query Result number of filename coupling is very huge, only show that the result's of a type mode has greatly reduced the number of times of user's page turning, improved the efficient of inquiry.For example, inquire about the relevant documentation of C++builder, directly use the inquiry of object oriented file name, 237 hit results are arranged when not specifying extension name, have only 7 hit results when specifying the .doc extension name, and we specify in and inquire about in the Doctype after having used file layout classified inquiry technology, and then hit results has 19, and such result does not have unnecessary alternative document information (as program file of C++builder etc.) to comprise the document of the various forms of all needs again.
2. improved the recall ratio of inquiry system.Application is based on the file layout classified inquiry technology of filename with based on behind the catalogue file classifying content inquiring technology, and the hit results number increases considerably during searching multimedia files, and many files with numeral or sequence number name are able to be it is found that.To the inquiry of TV series, to singer's inquiry, to the inquiry of special edition, all very convenient directly perceived to the inquiry of pictures.This improvement is main film, the music inquiry sharp weapon that kept general polling simultaneously again with making inquiry system become one from a general file polling system with the multimedia inquiry.
3. make that inquiry system is oversimplified, ease-of-use.Can encourage domestic consumer to use the file polling system greatly inquiry classification and the mode of setting up the shortcut system.Because the classification of inquiry is based upon on file layout sorting technique and the file content sorting technique, various complicated query options (comprise the file layout type, size restriction or the like) all is hidden in the inquiry URL of shortcut correspondence, for the user who does not much know to want to look for what software (as want to see action movie and do not mind the user of any action movie) or to the not clear user of the dbase of wanting to look for (as want to look for NetAnts and do not know that its dbase is the user of netant), the user uses inquiry system to do, and just can be to select rather than do requirement.After using shortcut, the user uses the ratio of shortcut will account for major part in all inquiries, because the shortcut that system provided has comprised the inquiry of most of user's needs.Like this, because the coupling string of shortcut is fixed, have the Cache hit rate of the inquiry system of buffering to increase greatly, most of inquiry can obtain Query Result in Cache in the extremely short time, thereby has also improved the efficient of inquiry.
4. be the upgrading and the important supplement of the inquiring technology of object oriented file name.The document classification technology of searching based on filename, catalogue and enquiry frequency not is substituting of traditional file inquiring technology, but upgrading with replenish but do not used filename coupling and attribute filtering technique because it proposes how to carry out the coupling of filename.On the inquiry system of ready-made object oriented file name, carry out part and revise the inquiry system that just can become a use classification with interpolation, also kept the query function of old object oriented file name simultaneously.The document classification technology of searching based on filename, catalogue and enquiry frequency makes the inquiry system of object oriented file name have the ability of subject-oriented and excavation hiding data, the inquiry manual sort technology of considering for domestic consumer makes inquiry system more popular simultaneously, is easy to be accepted by the user.
The present invention can be applied to comprise related fields such as FTP search engine, MP3 searcher, this machine file polling, Library Resources retrieval.

Claims (24)

1, a kind of sorting and searching method for files, by the user input query request, computing machine returns satisfactory file according to user's query requests, it is characterized in that: the file content classified inquiry based on catalogue is adopted in the inquiry of computing machine; During inquiry the filename of file and its last layer catalogue merged and do the as a whole inquiry that is used for, hit results or filename have hit the coupling string, or its last layer directory name has hit the coupling string.
2, sorting and searching method for files according to claim 1 is characterized in that: this file polling method is used for the inquiry of multimedia file, comprises multimedia file types such as audio frequency, video and image; When the user searches this class multimedia file, adopt file content classified inquiry based on catalogue, the directory name and the filename of file place catalogue come along the matching inquiry string.
3, sorting and searching method for files according to claim 1 and 2, it is characterized in that: during computer inquery the as a whole inquiry that is used for is done in the filename and the merging of its last layer catalogue of file, inquiry system is being set up index, all filename and its last layer catalogue is being done as a whole carrying out when the result shows.
4, according to one of the arbitrary claim of claim 1-3 described sorting and searching method for files, it is characterized in that: the inquiry of computing machine is adopted based on the file content classified inquiry of catalogue and is inquired the method that combines by classification with the file layout based on file extension; Wherein the file layout classified inquiry method based on file extension is meant, the query requests of being imported during user inquiring comprises filename key word and two parts of file layout type, during inquiry in the form classification of appointment inquiry file and need not the specified file extension name.
5, according to one of the arbitrary claim of claim 1-3 described sorting and searching method for files, it is characterized in that: the method that combines with inquiry manual sort based on enquiry frequency based on the file content classified inquiry of catalogue is adopted in the inquiry of computing machine; Wherein the inquirer's work point class methods based on enquiry frequency are meant, inquiry URL commonly used is built up shortcut, and the user only need click shortcut just can obtain Query Result.
6, according to one of the arbitrary claim of claim 1-3 described sorting and searching method for files, it is characterized in that: the inquiry of computing machine is adopted based on the file content classified inquiry of catalogue and the method for inquiring by classification based on the file layout of file extension and combining based on the inquiry manual sort of enquiry frequency; Wherein the file layout classified inquiry method based on file extension is meant, the query requests of being imported during user inquiring comprises filename key word and two parts of file layout type, during inquiry in the form classification of appointment inquiry file and need not the specified file extension name; Inquirer's work point class methods based on enquiry frequency refer to, inquiry URL commonly used is built up shortcut, and the user only need click shortcut just can obtain Query Result.
7, according to claim 4 or 6 described sorting and searching method for files, it is characterized in that: in the file layout classified inquiry method based on file extension, utilize the standard of the extension name of filename as document classification, file is divided into various Format Type, every kind of corresponding some extension name of Format Type.
8, according to claim 4,6 or 7 described sorting and searching method for files, it is characterized in that: in the file layout classified inquiry method based on file extension, comprise types such as document, video, audio frequency, image, program, source code, catalogue based on the file layout type of file extension.
9, according to claim 7 or 8 described sorting and searching method for files, it is characterized in that: based on the generally understanding of the file layout classification criterion of file extension to certain extension name institute respective file type; Belong to the situation of plurality of classes for a kind of extension name, get its modal classification.
10, according to the described sorting and searching method for files of the arbitrary claim of claim 7-9, it is characterized in that: in the file layout classified inquiry method based on file extension, when inquiry system obtains a file entries, utilize its extension name to obtain its corresponding file form classification, be kept in the attribute of file entries; When the file of user inquiring specified file format type, type number that the user is selected and the type number in the file entries attribute are made comparisons; From the result that the filename coupling obtains, filter out the Query Result that filename coupling while specified file format type is also mated.
11, according to claim 5 or 6 described sorting and searching method for files, it is characterized in that: among the inquiry manual sort based on enquiry frequency, inquiry URL commonly used is carried out two-stage classification, and the first order is categorized as the file layout classification, and the second level is sorted in this classification classifies by content.
12, according to claim 5,6 or 11 described sorting and searching method for files, it is characterized in that: among the inquiry manual sort based on enquiry frequency, utilize program to write down the number of clicks of each shortcut, when showing all shortcut clauses and subclauses of a classification,, provide such other inquiry seniority among brothers and sisters thus simultaneously by clicks ordering output.
13, according to one of the arbitrary claim of claim 1-12 described sorting and searching method for files, it is characterized in that:, can inquire about according to file content classification based on catalogue for searching of multimedia file; When multimedia type file is searched in user's appointment, by the file of inquiry system matching files name or file place upper directory name matching inquiry key word.
14, according to one of claim 1-5,7-10,13 arbitrary claims described sorting and searching method for files, it is characterized in that: the user can be according to the file layout classification based on file extension, two search requests of import file name key word and file layout are exported the file that meets these two requirements by the inquiry system coupling.
15, according to one of claim 1-3,5,6, the arbitrary claim of 11-14 described sorting and searching method for files, it is characterized in that: the user can be according to the inquiry sorting technique based on enquiry frequency, according to the listed files of often searching that inquiry system is provided, select needed file polling link according to the enquiry frequency arrangement in of all categories.
16, a kind of sorting and searching method for files, by the user input query request, return the file that meets query requests by computing machine according to user's query requests, it is characterized in that: the method that the inquiry employing document classification of computing machine combines with the inquiry manual sort based on enquiry frequency; Wherein the file layout classified inquiry method based on file extension is meant, the query requests of being imported during user inquiring comprises filename key word and two parts of file layout type, during inquiry in the form classification of appointment inquiry file and need not the specified file extension name; Inquirer's work point class methods based on enquiry frequency refer to, inquiry URL commonly used is built up shortcut, and the user only need click shortcut just can obtain Query Result.
17, sorting and searching method for files according to claim 16, it is characterized in that: in the file layout classified inquiry method based on file extension, utilize the standard of the extension name of filename as document classification, file is divided into various Format Type, every kind of corresponding some extension name of Format Type.
18, according to claim 16 or 17 described sorting and searching method for files, it is characterized in that: in the file layout classified inquiry method based on file extension, comprise types such as document, video, audio frequency, image, program, source code, catalogue based on the file layout type of file extension.
19, according to claim 17 or 18 described sorting and searching method for files, it is characterized in that: based on the generally understanding of the file layout classification criterion of file extension to certain extension name institute respective file type; Belong to the situation of plurality of classes for a kind of extension name, get its modal classification.
20, according to the described sorting and searching method for files of the arbitrary claim of claim 16-19, it is characterized in that: in the file layout classified inquiry method based on file extension, when inquiry system obtains a file entries, utilize its extension name to obtain its corresponding file form classification, be kept in the attribute of file entries; When the file of user inquiring specified file format type, type number of selecting with the user and the type number in the file entries attribute are made comparisons; From the result that the filename coupling obtains, filter out the Query Result that filename coupling while specified file format type is also mated.
21, sorting and searching method for files according to claim 16, it is characterized in that: among the inquiry manual sort based on enquiry frequency, inquiry URL commonly used is carried out two-stage classification, and the first order is categorized as the file layout classification, and the second level is sorted in this classification classifies by content.
22, according to claim 16 or 21 described sorting and searching method for files, it is characterized in that: among the inquiry manual sort based on enquiry frequency, utilize program to write down the number of clicks of each shortcut, when showing all shortcut clauses and subclauses of a classification,, therefore provide such other inquiry seniority among brothers and sisters simultaneously by clicks ordering output.
23, according to one of the arbitrary claim of claim 16-20 described sorting and searching method for files, it is characterized in that: the user can be according to the file layout classification based on file extension, two search requests of import file name key word and file layout are exported the file that meets these two requirements by the inquiry system coupling.
24, according to one of claim 16,21 or 22 arbitrary claims described sorting and searching method for files, it is characterized in that: the user can be according to the inquiry sorting technique based on enquiry frequency, according to the listed files of often searching that inquiry system is provided, select needed file according to the enquiry frequency arrangement in of all categories.
CN 02100839 2002-01-30 2002-01-30 Sorting and searching method for files Pending CN1360267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 02100839 CN1360267A (en) 2002-01-30 2002-01-30 Sorting and searching method for files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 02100839 CN1360267A (en) 2002-01-30 2002-01-30 Sorting and searching method for files

Publications (1)

Publication Number Publication Date
CN1360267A true CN1360267A (en) 2002-07-24

Family

ID=4739480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 02100839 Pending CN1360267A (en) 2002-01-30 2002-01-30 Sorting and searching method for files

Country Status (1)

Country Link
CN (1) CN1360267A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1329802C (en) * 2002-11-12 2007-08-01 索尼计算机娱乐公司 Method and apparatus for processing files utilizing a concept of weight so as to visually represent the files in terms of whether the weight thereof is heavy or light
CN100442284C (en) * 2004-01-14 2008-12-10 Nhn株式会社 Search system for providing information of keyword input frequency by category and method thereof
CN100456676C (en) * 2005-12-01 2009-01-28 国际商业机器公司 System and method of combining metadata of file in backup storage device
CN101124576B (en) * 2004-03-15 2010-06-16 雅虎公司 Search system and methods with integration of user annotations from a trust network
CN102073731A (en) * 2011-01-17 2011-05-25 宇龙计算机通信科技(深圳)有限公司 File processing method and terminal
CN1864129B (en) * 2003-10-23 2011-07-06 微软公司 System and method for presenting related items to a user
CN1716255B (en) * 2004-07-01 2012-01-11 微软公司 Dispersing search engine results by using page category information
CN101840400B (en) * 2009-03-19 2012-02-01 北大方正集团有限公司 Multilevel classification retrieval method and system
CN102436449A (en) * 2010-09-29 2012-05-02 腾讯科技(深圳)有限公司 Method and device for acquiring audio file name
CN101546323B (en) * 2008-03-28 2012-05-30 北京华旗资讯数码科技有限公司 Index system and search method for rapidly searching multimedia file
CN102509039A (en) * 2010-09-30 2012-06-20 微软公司 Realtime multiple engine selection and combining
CN102708197A (en) * 2012-05-16 2012-10-03 Tcl集团股份有限公司 Multimedia file management method and device
CN102955789A (en) * 2011-08-22 2013-03-06 幻音科技(深圳)有限公司 Resource display method and resource display system
CN102999601A (en) * 2012-11-20 2013-03-27 广东欧珀移动通信有限公司 Method for sorting files, and multimedia terminal
CN103324628A (en) * 2012-03-21 2013-09-25 腾讯科技(深圳)有限公司 Industry classification method and system for text publishing
CN103678295A (en) * 2012-08-29 2014-03-26 北京百度网讯科技有限公司 Method and device for providing files for user
CN104063438A (en) * 2014-06-11 2014-09-24 惠州华阳通用电子有限公司 Multimedia file searching method
CN105282495A (en) * 2014-07-11 2016-01-27 联咏科技股份有限公司 Archive searching method and image processor
CN105574062A (en) * 2015-07-01 2016-05-11 宇龙计算机通信科技(深圳)有限公司 File retrieval method and apparatus and terminal
CN106997367A (en) * 2016-01-26 2017-08-01 华为技术有限公司 Sorting technique, sorter and the categorizing system of program file
CN103699694B (en) * 2014-01-13 2017-08-29 联想(北京)有限公司 A kind of data processing method and device
CN108009204A (en) * 2017-11-02 2018-05-08 深圳市网心科技有限公司 Method and system based on extension name classification and de-redundancy
CN108460075A (en) * 2017-12-28 2018-08-28 上海顶竹通讯技术有限公司 A kind of file content search method and system
US10152491B2 (en) 2014-07-11 2018-12-11 Novatek Microelectronics Corp. File searching method and image processing device thereof
CN110008186A (en) * 2019-04-11 2019-07-12 北京启迪区块链科技发展有限公司 For file management method, device, terminal and the medium of more ftp data sources
CN114153795A (en) * 2021-11-25 2022-03-08 北京融安特智能科技股份有限公司 Method and device for intelligently calling electronic archive, electronic equipment and storage medium

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1329802C (en) * 2002-11-12 2007-08-01 索尼计算机娱乐公司 Method and apparatus for processing files utilizing a concept of weight so as to visually represent the files in terms of whether the weight thereof is heavy or light
CN1864129B (en) * 2003-10-23 2011-07-06 微软公司 System and method for presenting related items to a user
CN100442284C (en) * 2004-01-14 2008-12-10 Nhn株式会社 Search system for providing information of keyword input frequency by category and method thereof
CN101124576B (en) * 2004-03-15 2010-06-16 雅虎公司 Search system and methods with integration of user annotations from a trust network
CN1716255B (en) * 2004-07-01 2012-01-11 微软公司 Dispersing search engine results by using page category information
CN100456676C (en) * 2005-12-01 2009-01-28 国际商业机器公司 System and method of combining metadata of file in backup storage device
CN101546323B (en) * 2008-03-28 2012-05-30 北京华旗资讯数码科技有限公司 Index system and search method for rapidly searching multimedia file
CN101840400B (en) * 2009-03-19 2012-02-01 北大方正集团有限公司 Multilevel classification retrieval method and system
CN102436449A (en) * 2010-09-29 2012-05-02 腾讯科技(深圳)有限公司 Method and device for acquiring audio file name
CN102509039A (en) * 2010-09-30 2012-06-20 微软公司 Realtime multiple engine selection and combining
US8869277B2 (en) 2010-09-30 2014-10-21 Microsoft Corporation Realtime multiple engine selection and combining
CN102509039B (en) * 2010-09-30 2014-11-26 微软公司 Realtime multiple engine selection and combining
CN102073731A (en) * 2011-01-17 2011-05-25 宇龙计算机通信科技(深圳)有限公司 File processing method and terminal
CN102955789A (en) * 2011-08-22 2013-03-06 幻音科技(深圳)有限公司 Resource display method and resource display system
CN103324628A (en) * 2012-03-21 2013-09-25 腾讯科技(深圳)有限公司 Industry classification method and system for text publishing
CN102708197A (en) * 2012-05-16 2012-10-03 Tcl集团股份有限公司 Multimedia file management method and device
CN103678295A (en) * 2012-08-29 2014-03-26 北京百度网讯科技有限公司 Method and device for providing files for user
CN103678295B (en) * 2012-08-29 2017-09-19 北京音之邦文化科技有限公司 Method and device for providing files for user
CN102999601A (en) * 2012-11-20 2013-03-27 广东欧珀移动通信有限公司 Method for sorting files, and multimedia terminal
CN103699694B (en) * 2014-01-13 2017-08-29 联想(北京)有限公司 A kind of data processing method and device
CN104063438A (en) * 2014-06-11 2014-09-24 惠州华阳通用电子有限公司 Multimedia file searching method
CN105282495A (en) * 2014-07-11 2016-01-27 联咏科技股份有限公司 Archive searching method and image processor
US10152491B2 (en) 2014-07-11 2018-12-11 Novatek Microelectronics Corp. File searching method and image processing device thereof
CN105574062A (en) * 2015-07-01 2016-05-11 宇龙计算机通信科技(深圳)有限公司 File retrieval method and apparatus and terminal
CN106997367A (en) * 2016-01-26 2017-08-01 华为技术有限公司 Sorting technique, sorter and the categorizing system of program file
CN106997367B (en) * 2016-01-26 2020-05-08 华为技术有限公司 Program file classification method, classification device and classification system
US10762194B2 (en) 2016-01-26 2020-09-01 Huawei Technologies Co., Ltd. Program file classification method, program file classification apparatus, and program file classification system
CN108009204A (en) * 2017-11-02 2018-05-08 深圳市网心科技有限公司 Method and system based on extension name classification and de-redundancy
CN108460075A (en) * 2017-12-28 2018-08-28 上海顶竹通讯技术有限公司 A kind of file content search method and system
CN108460075B (en) * 2017-12-28 2021-11-30 上海顶竹通讯技术有限公司 File content retrieval method and system
CN110008186A (en) * 2019-04-11 2019-07-12 北京启迪区块链科技发展有限公司 For file management method, device, terminal and the medium of more ftp data sources
CN114153795A (en) * 2021-11-25 2022-03-08 北京融安特智能科技股份有限公司 Method and device for intelligently calling electronic archive, electronic equipment and storage medium
CN114153795B (en) * 2021-11-25 2023-02-10 北京融安特智能科技股份有限公司 Method and device for intelligently calling electronic archive, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN1360267A (en) Sorting and searching method for files
CN1310175C (en) International information search and deivery system providing search results personalized to a particular natural language
Glover et al. Architecture of a metasearch engine that supports user information needs
KR101183312B1 (en) Dispersing search engine results by using page category information
Ma et al. Efficiently finding web services using a clustering semantic approach
Li et al. Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data
US7945579B1 (en) Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems
US8832058B1 (en) Systems and methods for syndicating and hosting customized news content
US6792414B2 (en) Generalized keyword matching for keyword based searching over relational databases
US7783626B2 (en) Pipelined architecture for global analysis and index building
US8856096B2 (en) Extending keyword searching to syntactically and semantically annotated data
US20040205044A1 (en) Method for storing inverted index, method for on-line updating the same and inverted index mechanism
RU2236699C1 (en) Method for searching and selecting information with increased relevance
US20030088715A1 (en) System for keyword based searching over relational databases
US20040267693A1 (en) Method and system for evaluating the suitability of metadata
US20100121790A1 (en) Method, apparatus and computer program product for categorizing web content
US7024405B2 (en) Method and apparatus for improved internet searching
US20070033228A1 (en) System and method for dynamically ranking items of audio content
US8661069B1 (en) Predictive-based clustering with representative redirect targets
Jadidoleslamy Introduction to metasearch engines and result merging strategies: a survey
EP2083364A1 (en) Method for retrieving a document, a computer-readable medium, a computer program product, and a system that facilitates retrieving a document
CN1763739A (en) Search method based on semantics in search engine
Silva Searching and archiving the web with tumba
Kosmynin From bookmark managers to distributed indexing: an evolutionary way to the next generation of search engines
KR20020001960A (en) Search method of Broadcast and multimedia file on Internet

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication