File fine system and method thereof
Technical field
The present invention is relevant for a kind of file fine system and method thereof, particularly relevant for a kind of file fine system and method thereof for a local computing or a local medium.
Background technology
In this age of knowledge explosion, people usually can, via computer or network, store or process many files or data.In the face of these huge electronic data, how effectively to hunt out the required part of user, becoming is the frequent operation of computer or network user.For the operation of searching, be divided into two large classes at present, the one, Web search engine, another is table-board search.Web search engine is to search for the web page contents on world-wide web or file, by Web Spider (web spider), capture content, and come reconstruct these contents of index by specific algorithm, in addition can be in conjunction with some ordering techniques, such as page rank technology (page ranking) sorts for search result, then present to user.Well-known Web search engine comprises: Google, Yahoo, Baidu etc.
Yet table-board search, for the file in local computing (local computer) or local medium (local storage medium), search, these files comprise Email, word shelves, Office document file, the temporary shelves of browser etc.More well-known Related product comprises: file fine device, Google Desktop Search, Yahoo Widget Engine etc. that Microsoft's Windows operating system is built-in.Because the function of the built-in file fine device of Microsoft's Windows operating system is perfect gradually, so the companies such as Google and Yahoo exited gradually this field after 2011.Yet the file fine device that Microsoft is built-in, still only searches for the correlativity of file content and key word, can not meet user's demand completely.For instance, when user forgets key word or key word cannot be provided accurately, will cause the puzzlement on file fine.
Summary of the invention
One of viewpoint of the present invention is exactly to be to provide a kind of file fine system and method thereof, can, simultaneously with reference to the correlativity of file and key word, with user's frequent behavior, search, so that better search result to be provided.
Another viewpoint of the present invention is to provide a kind of file fine system and method thereof exactly, adds user to the behavior record of file operation and analysis, so that the search result that more approaches user's demand to be provided.
A viewpoint more of the present invention is to provide a kind of file fine system and method thereof exactly, when forgeing key word or key word cannot be provided accurately for user, still can provide search result, and favourable user revises key word, searches more accurately.
According to above-mentioned viewpoint of the present invention, a kind of document searching method is proposed, be applied to a local medium, local medium stores a plurality of files, and a user can be searched file.Document searching method comprises: the relation by the content of each file for a plurality of key words, produces a file full-text index feature of corresponding each file, and be stored in respectively a search database.When user carries out an operation at least two files in file, or when to described these files, one of them carries out at least two operations, store the number of times of this operation and this operational correspondence in searching database, and produce respectively the linked character between each file, and be stored in search database.When user inputs a search key word, according to the file full-text index feature and the relation of searching key word of searching in database, produce one first search result, and according to linked character and first search result of searching in database, produce one second search result.Integrate the first search result and the second search result, to produce a corresponding sequence of searching the partial document of key word.
According to some embodiment of the present invention, a plurality of files comprise word shelves, mail shelves, the temporary shelves of browser, and document file.And user comprises the operation of file: open, store, switch, search, copy, paste and link.Linked character comprises the operation of user to this two file, with for this operation, the temporal correlation between two files forms.And property time correlation comprises: time sequencing and the time interval.
According to some embodiment of the present invention, the method that produces the file full-text index feature of corresponding each file comprises: utilize a word frequency-reverse document frequency algorithm, calculate the content of each file for the relation of a plurality of key words.
According to some embodiment of the present invention, wherein produce the linked character of each file, all operations carrying out for a plurality of files by recording user, obtains to carry out a correlation analysis.And correlation analysis is for regularly, irregularly or immediately carry out.
According to some embodiment of the present invention, the first search result wherein, except searching file full-text index feature in database according to this and searching the relation of key word, also comprises according to searching the weight of key word in file full-text index feature and producing.And integrate the step of the first search result and the second search result, more comprise a weight set-up procedure.
According to above-mentioned viewpoint of the present invention, propose a kind of file fine system, comprising: a local medium, a load module, a transcript analysis module, an association analysis module, are searched database, one first search module, one second search module, an integrate module and an output module.Local medium stores a plurality of files, and load module is suitable for receiving a search condition of user's input.Transcript analysis module, in order to the relation for a plurality of key words by the content of each file, produces a file full-text index feature of corresponding each file.Association analysis module is in order to carry out an operation as user at least two files in a plurality of files or one of them carries out at least two whens operation to described these files, record the number of times of this operation and this operational correspondence, and produce respectively a linked character of each file.Search database and be stored in local medium, and comprise: the file full-text index feature of each file, the relative recording of this operation and linked character.The first search module is suitable for according to search condition, and the file full-text index feature in search database and the relation of search condition, produce one first search result; And the second search module is suitable for, according to linked character and first search result of searching in database, producing one second search result.Integrate module is integrated the first search result and the second search result, and to produce a sequence of the partial document of corresponding search condition, and output module is in order to show this sequence.
According to some embodiment of the present invention, wherein the method for the file full-text index feature of corresponding each file of transcript analysis module generation comprises: utilize a word frequency-reverse document frequency algorithm, calculate the content of each file for the relation of a plurality of key words.
According to some embodiment of the present invention, wherein association analysis module more comprises: module, all operations carrying out for file in order to record user are collected in user's behavior; And user's behavioural analysis module, for user's behavior, collect the content that module records and carry out a correlation analysis, and produce linked character.And correlation analysis is for regularly, irregularly or immediately carry out.
According to some embodiment of the present invention, the first search module wherein, except according to the relation of the file full-text index feature in this search database and search condition, also comprise the weight in file full-text index feature according to search condition, and produce the first search result.
According to some embodiment of the present invention, wherein integrate module also comprises that a weight adjusting module is in order to adjust the weight between the first search result and the second search result.
By file fine system of the present invention and method thereof, record user for the operation of file, to set up the linked character between each file, can be simultaneously during search with reference to the correlativity of file and key word, with linked character, so that better search result to be provided.
By file fine system of the present invention and method thereof, for user, to the behavior record of file operation and analysis, in search result sequence, considered the relevance between file, the search result that more approaches user's demand can be provided.
By file fine system of the present invention and method thereof, when even user forgets key word or key word cannot be provided accurately, by the search of linked character, still can provide search result, and provide the search result sequence that approaches actual key most, favourable user revises key word accordingly, then searches more accurately.
Accompanying drawing explanation
Fig. 1 illustrates according to one embodiment of the invention, a kind of calcspar of file fine system;
Fig. 2 illustrates according to one embodiment of the invention, a kind of process flow diagram of document searching method;
About advantage of the present invention, spirit and feature, will, with embodiment and with reference to appended graphic, be elaborated and discuss.It should be noted that for the present invention can be easier to understand, accompanying graphic be only schematic diagram, relative dimensions not illustrates with actual ratio.
[symbol description]
100: the calcspar of file fine system
110: local medium
112: file
114: search database
116: operation note
118: linked character
120: file full-text index feature
122: load module
124: this paper analysis module
126: association analysis module
128: module is collected in user's behavior
130: user's behavioural analysis module
132: the first search module
134: the second search module
136: integrate module
138: output module
140: weight adjusting module
200: the process flow diagram of document searching method
210: learning phase
212,214: step
220: the search stage
222,224,226,228: step
Embodiment
In order to allow advantage of the present invention, spirit and feature can be more easily and be expressly understood that, follow-up will be with embodiment and with reference to appended graphic detailed description and discussion.It should be noted that these embodiment are only the representational embodiment of the present invention, the ad hoc approach wherein given an example, device, condition and material etc. are not in order to limit the present invention or corresponding embodiment.
Referring to Fig. 1 and Fig. 2, Fig. 1 illustrates according to one embodiment of the invention, square Figure 100 of a kind of file fine system; Fig. 2 illustrates according to one embodiment of the invention, flow process Figure 200 of a kind of document searching method.File fine system and method for the present invention is established in local computing (local computer) or local medium (local storage medium), can provide user to search for these local files.Local computing comprises a PC or server etc., and wherein at least comprises a local medium 110; Local medium 110 is such as being hard disk (hard disk), solid state hard disc (SSD) or disk array (RAID) etc.User is in local medium 110, can store all multifiles 112, such as by mounting software, foundation/editing files, from other medium xcopies or from world-wide web download file etc., all can produce all multifiles 112 and storage or be temporary in local medium 110.These files 112 comprise the temporary shelves of Email, word shelves, Office document file and browser etc.
Document searching method of the present invention was mainly divided into for two megastages, and one is that learning phase 210, is the search stage 220.At learning phase 210, be mainly to be undertaken by this paper analysis module 124 and association analysis module 126.User for the key word search of file 112, can be undertaken by a load module 122 usually, and load module 122 is such as being keyboard, contact panel or voice input module etc., such as input key word " Inventec Appliances ", to search Inventec Appliances associated documents.Analysis module 124 can connect the key word in these search activities and file 112 herein, and set up a file full-text index feature 120 for each file 112, and be stored in the search database 114 in local medium 110 that is the step 212 in Fig. 2.Yet the content recording in file full-text index feature 120 includes but not limited to the relation of key word and file content, and the weight of key word in file, frequency occurring hereof such as key word etc.For instance, the file full-text index feature 120 of certain file 112 may record " Inventec Appliances ”Yu“ Inventec " two key words, and the number of times of these two key words appearance.Although yet technology of the present invention is the search for local file, the foundation of file full-text index feature 120 in learning phase 210, also can be for user when carrying out world-wide web search, the key word of inputting records and analyzes.
In certain embodiments of the present invention, the method that produces file full-text index feature 120 can adopt word frequency-reverse document frequency algorithm (term frequency – inverse document frequency, TF-IDF), as the parameter in file full-text index feature 120.Wherein word frequency refers to the frequency that special key words occurs hereof, i.e. the ratio of key word occurrence number and total number of word.Reverse file is the tolerance of key word general importance frequently, be by total files divided by the number of files that comprises this key word, business's value of taking the logarithm of gained and obtaining.Above-mentioned word frequency and reverse document frequency are multiplied each other, as one of parameter in file full-text index feature.
Association analysis module 126, is mainly the relation of setting up between each file, because user, when operating for file, conventionally have certain train of thought or logic, and these operations is the relevances between file.For example, user when carrying out the writing of report file shelves, the data on may search for networks, then the data in download network, even copy partial content, so this part of document file, will there is certain associated with the file of downloading, may comprise same or similar key word therebetween.So these users are for the operation of file,, can there is certain relation in formed file association when searching.In certain embodiments of the present invention, association analysis module 126 comprises user's behavior and collects module 128 and user's behavioural analysis module 130.Module 128 is collected in user's behavior, all operations carrying out for file 112 in order to record user, and these operations include but not limited to open, store, switch, search, copy, paste and link etc.The operation note 116 of all user's behaviors can be stored in searches in database 114.User's behavioural analysis module 130, for user's behavior, collect the content that module 128 records, i.e. a correlation analysis is carried out in operation note 116, and produces the linked character 118 of each file 112, and being stored in search database 114, that is step 214 in Fig. 2.And the content of linked character 118 at least comprises two operation notes between file, frequency/the frequency of same operation, for certain, operate the temporal correlation between two files, such as time sequencing or the time interval, or to two of a certain file operations etc.Yet the correlation analysis of user's behavioural analysis module 130 can be regularly, irregularly or immediately to carry out, and upgrade the linked character 118 of searching in database 114.
For example, the example that is operating as that we open (open), storage (save), switch (goto), search (search) for file with user.For the file of key word search result, linked character that should reference comprises:
The file that (latest) opens, stores recently, switches or search recently recently recently.
When opening the laggard line search of a certain file.
After opening a certain file, switch to another file.
The occurrence frequency of opening, storing, switch, search.
Two files are opened simultaneously, store, switch, are searched.
Above-mentioned these linked characters all should affect the sequence between key word search result file.For instance, if user searches key word A, and want that the file of looking for is that he is when writing mail, once with reference to the related data of key word A in certain file.Search so except for key word A, should have after the mail shelves of key word A with reference to unlatching, the file of opening or opening or switch subsequently simultaneously, this linked character should affect the sequence of search result.If key word A search result has file B, C, D and E, and wherein B is mail shelves, and for the linked character with B, the frequency that discovery D opens with B is simultaneously relative high, and search result should be take D as preferential.Certainly, if user can except input key condition, also can input some linked character conditions when searching, system of the present invention can further be searched for this linked character certainly.Yet user is fuzzy maybe cannot determine time often for linked character in practice, system of the present invention is that the number of times or the frequency that for linked character, occur go to adjust search result.For instance, if have F, G, H, I and five files of J for key word search result, yet for the linked character of searching in database, find that the linked character record of I file and other four files is many especially, the search result sequence of I file can be adjusted forward.
Therefore, file fine System and method for of the present invention is searched in the stage 220 in subordinate phase, also can be divided into two parts.When user is by load module 122, input search condition, such as being key word and/or linked character, the i.e. step 222 of Fig. 2.In certain embodiments of the invention, the first search module 132, can, according to searching file full-text index feature 120 in database 114, search with respect to the key word of input, and produce the first search result (Fig. 2 step 224).Then, the second search module 134, linked character 118 and the first search result according to searching in database 114, produce one second search result (Fig. 2 step 226).If user has input linked character, directly for linked character, search, and produce the second search result.If user does not input linked character, can be according in the first search result, the number of times of each file association feature 118 or frequency, produce the second search result accordingly.Integrate module 136 is integrated the first search result and the second search result, and produces the file sequence (Fig. 2 step 228) of search result, and offers user's reference by output module 138.Output module 138 is such as being display or voice output module etc.Yet, when 136 of integrate modules are integrated the first search result and the second search result, namely integrate key word search result and linked character search result, can, according to demand by weight adjusting module 140, adjust the weight of the first search result and the second search result.Certainly, the weight allocation in weight adjusting module 140 can obtain in user's behavioural analysis.For example, the linked character type that the each user of monitoring statistics searches rear institute file opening, to adjust the weight of the type.
For example, in certain embodiments of the present invention, according to the search condition of key word a, the first search module 132, the file sequence producing according to file full-text index feature 120 is classified A, B, C, D and E (the first search result) as, and the file sequence that the second search module 134 produces according to linked character 118 is classified B, C, A, E, D, F and G as, wherein file F with G for to there is relative high linked character with file C and A, and after integrate module 136 is integrated, file sequence may be B, A, C, E, D, F and G (linked character weight is lower).Can find in this embodiment, by file fine System and method for of the present invention, can find out and not have a key word, but very high file F and the G of relevance.When the key word of inputting as user is inaccurate or fuzzy, can find possible key word to be present in file F and G by this mode, user can adjust search condition accordingly.
It should be noted that in certain embodiments of the present invention, the operation of the first search module 132 and the second search module 134 can independently be carried out, and to integrate module 136, integrates again.In addition, the generation of linked character is not limited to the associated record for two files or two operation rooms, can be also the associated record of a plurality of files or a plurality of operation rooms.Haveing the knack of this skill person should know, above-mentioned by the linked character between user's operation behavior generation file, to adjust search result, wherein linked character can have other many variations, combination, so all should be contained in spirit of the present invention and scope.
In sum, by file fine system of the present invention and method thereof, record user for the operation of file, to set up the linked character between each file, can be simultaneously during search with reference to the correlativity of file and key word, with linked character, so that better search result to be provided.File fine system of the present invention and method thereof, to the behavior record of file operation and analysis, considered the relevance between file and operation for user in search result sequence, the search result that more approaches user's demand can be provided.And by file fine system of the present invention and method thereof, when even user forgets key word or key word cannot be provided accurately, by the search of linked character, still can provide the search result that relevance is high, and provide the search result sequence that approaches actual key most, favourable user revises key word accordingly, then searches more accurately.
By the above detailed description of preferred embodiments, hope can be known description feature of the present invention and spirit more, and not with above-mentioned disclosed preferred embodiment, category of the present invention is limited.On the contrary, its objective is that hope can contain in the category of the scope of the claims of being arranged in of various changes and tool equality institute of the present invention wish application.Although the present invention discloses as above with embodiment; so it is not in order to limit the present invention, and any those skilled in the art, within not departing from the spirit and scope of this creation; when being used for a variety of modifications and variations, so protection scope of the present invention is when being as the criterion of defining depending on accompanying claim.