CN103177122B - Personal desktop document searching method based on synonyms - Google Patents
Personal desktop document searching method based on synonyms Download PDFInfo
- Publication number
- CN103177122B CN103177122B CN201310128267.4A CN201310128267A CN103177122B CN 103177122 B CN103177122 B CN 103177122B CN 201310128267 A CN201310128267 A CN 201310128267A CN 103177122 B CN103177122 B CN 103177122B
- Authority
- CN
- China
- Prior art keywords
- word
- synonym
- personal
- file
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention provides a personal document searching method based on synonyms. The personal document searching method comprises the following steps of: carrying out word segmentation on document names with concentrated data by a conventional tokenizer, carrying out synonym matching by using an online dictionary website after word segmentation, and extracting synonym and near-synonym information of words returned by the online dictionary website to obtain a user-personalized synonym table by using a webpage gathering technology; and then based on input keywords by using a character string matching method and combining the corresponding synonym table, a document containing a searching word or the synonyms of the searching word is used as a searching result to be returned, and sorting is conducted based on the preference degree of a user on contained words of a document name. According to the personal document searching method, personal desktop files and the synonyms are combined; the solution is put forward specific to the query problem of the files in personnel data management; the personal document searching method has the characteristics of being concise and practical and easy in realization; and simultaneously, according to the personal document searching method, the file searching time of users can be greatly reduced, the users can inquire the personal desktop files conveniently, and the recall rate and the accuracy rate of the files are improved.
Description
Technical field
The present invention relates to personal information management field, more particularly, to a kind of synon personal document's searcher is based on
Method.
Background technology
The development of digitizing technique and web makes the quantity of information that people are processed daily increase severely, and the attention of people and can use
Time in data management is but basically unchanged, therefore personal data space management is increasingly becoming an important research and asks
Topic.The generalized definition of personal information management is exactly both to have included the management to individual memory information, is also included to external information
Management.As the development of information technology, the species of information resources, form are more and more, the side of traditionally on paper information is formerly used for
Method is no longer suitable for, the method for needing to probe into information management automation in terms of collection, arrangement, tissue, retrieval etc..Meanwhile, working as
Under, the popularization of PC greatly strengthen the ability of people's process and management information.Personal information management is led in many subjects
Domain is developed, including man-machine interaction, data base administration, information retrieval, information science etc..
At present, personal desktop's document retrieval method that people commonly use has certain limitation.With modern information technologies
With the development of the Internet, in magnanimity growth, on the other hand, store the price of equipment becomes lower to information, and user is more prone to
Jumbo storage device is bought to store more personal data, but user wants to search for oneself in the data of this magnanimity
Useful information, needs to take a long time.
It is the most-often used side of current people's management and querying individual desk file based on the resource browser of file system
Formula.People are browsed by bibliographic structure, find required data file.This method has following limitation:For one
A little files not used for a long time, user tends not to remember the accurate location of file storage, it may be necessary to carry out many
Secondary trial can just find required file, so as to lose time.Sometimes required file cannot even be found.
WDS is also a kind of method of lookup personal desktop file commonly used at present.Such as Google, Microsoft etc.
There is the desktop searching tool of oneself.The core of WDS technology is by setting up full-text index to desk file, so as to prop up
Hold the file that user is needed by keyword search.This method has following limitation:One is that some do not have for a long time in lookup
When having the file for using, user tends not to accurately remember required keyword;Two is that this mode can not be supported
Based on synon inquiry;Three is that full-text index also tends to cause than relatively low efficiency.
Existing personal data querying method has respective limitation.Praxiology research shows:The note of main object
Recall with certain regularity.This regularity shows many aspects.For example, memory of the main body to filename can over time
Gradually weaken;For the document entity that long-time is not accessed, user often forgets its storage location, and simply fuzzy
Certain key word included in the filename for remembering it etc., present desktop searching tool is simply according to the mode of string matching
Inquired about, and some desktop searching tools (such as desktop searching tool of Microsoft) are needed in whole file system (including being
The installation file of system) in inquired about.This inquiry mode will not only spend longer query time, and and search keyword
Similar filename is not checked out.
Entering line retrieval to file based on synonym can improve search efficiency and recall rate, and the present invention is aiming at this and asks
Topic.
The content of the invention
The present invention seeks to overcome the problems referred to above that prior art is present, propose that one kind is searched based on synon personal document
Suo Fangfa, the present invention is based upon a prototype system of inventor's research and development and user's desktop behavior is monitored, and collects big
Measure data and propose after being analyzed, mainly for user is solved for the file that long-time is not accessed, because not remembering it clearly
Storage location and accurate keyword message and cannot effective query problem.Such as when user needs inquiry one to read in the past
Cross and store on a personal computer with regard to index article when, may use when naming file originally due to the user
With " paper ", " article ", it is also possible to use English Paper or Article.User need inquire about this article when, just
Need repeatedly to be attempted using several keywords, so as to delay many times.Therefore synon inquiry is based on, can be solved
This problem.
The present invention is directed to the problem of management of file in personal computer, on the basis of based on keyword query, it is considered to look into
Ask the synonym relation of keyword so that when traditional desktop searching tool is inquired about based on string matching, extend its character string
Matching range, the present invention provide is included based on synon personal document's searching method concrete steps:
1st, participle is carried out by the filename in the data set that existing participle instrument will be collected by prototype system, together
When by those after participle do not have practical significance, the word comprising numeral filter out, then by word that filename is corresponding
Language is stored in data base, used as the word list (such as the Table A in Fig. 6) of user;
2nd, filename carries out synon matching after participle, and when synon matching is carried out, we utilize one
Individual online dictionary web sites are operated;
The all of word of 2.1st traversal, for each word, as the search word of online dictionary web sites;
2.2nd website can return a queried result website with regard to this word, it comprises the base of the word
The information such as this lexical or textual analysis, synonym, near synonym, antonym, using web page crawl technology by the synonym and near synonym of the word
Information crawler gets off;
2.3rd for each word in the synonym, near synonym for crawling out, after removing the participle for traveling through the user
Word list (such as the Table A in Fig. 6), if including these words for crawling out in word list, then just can be by it
As a pair related words, it is stored in data base, synonymously table (such as the table B in Fig. 6);
3rd, based on the key word of input, using character string matching method, and inquired about with reference to corresponding synonym;
3.1st input will inquire about a key word K of desk file;
Inquired about in 3.2nd table B in figure 6, inquired about the corresponding TongYiCi CiLin S of this key word;
3.3rd using the key word and the synonym found as an inquiry document searching keyword, as set SK;
Each word in 3.4th traversal set SK, inquires about its right in the word list (Table A in such as Fig. 6) of user
The filename answered;
3.5th returns Query Result (as shown in Figure 10).
The advantages of the present invention:
The present invention combines personal desktop's file with synonym, and the inquiry for file in personal data management is asked
Topic, proposes solution, and the method has unique creativeness, and the method had both been desirably integrated into existing search engine etc.
Personal organiser, it is also possible to which the technology is used in META Search Engine.
The inventive method is novel, with brief and practical, the characteristic easily realized, while the text of user can also be greatly reduced
Part search time, it is easy to user's querying individual desk file, improves the recall rate and accuracy rate of file.
Description of the drawings
Fig. 1 is block diagram of the present invention based on synon personal document's searching method;
Fig. 2 is the particular flow sheet of the filename participle step of the present invention;
Fig. 3 is the particular flow sheet of the structure synonym figure step of the present invention;
Fig. 4 is the particular flow sheet of the query steps of the present invention;
Fig. 5 is the displaying figure of a part of data of user in data set used in the present invention;
Fig. 6 is to carry out the result (Table A) after participle for the filename in Fig. 5, and stores its corresponding synon data
Table (table B);
Fig. 7 is the result of calculation figure that word preference is carried out for the word after file status word in Fig. 5;
Fig. 8 is the synonym figure constructed by Fig. 7;
Fig. 9 is the word after filename participle and number of times statistics;
Figure 10 is the Search Results in embodiment.
In order to the present invention and its advantage is more fully understood, below in conjunction with the accompanying drawings and specific embodiment does into one to the present invention
Step is explained.
Specific embodiment
Several concepts according to the present invention
Personal desktop's file (Personal Desktop File):
Personal desktop's file refers to the file that user accesses in PC, not including system file, for example, a text
Shelves, picture etc. can be regarded as personal desktop's file.
Personal desktop's dictionary (Personal Desktop Vocabulary):
Personal desktop's dictionary refers to the set of words for being included in filename in personal desktop's file, except those include number
Word, the word without practical significance.
Word preference (Word Preference Degree):
Word preference referred in the name of the filename of whole personal desktop's file, the access times of word.
Desktop synonym figure (Desktop Synonym Graph):
The node of desktop synonym figure refers to word of the filename of personal desktop's file after participle and passes through
The synonym of online dictionary web sites inquiry, it is synonym relation that the side of desktop synonym figure refers to two nodes.
Document keyword vector (File Keyword Vector):
Document keyword vector refers to the vector that the word that the filename of a file includes is constituted.
Embodiment 1
Below we are illustrated based on synon personal document's searching method with an example, and above concept is entered
The explanation of row example.
First, filename participle
For the file set in Fig. 5, we can obtain corresponding each word of filename after participle instrument,
Simultaneously we can also count the number of times of its appearance, as shown in figure 9, this represent personal desktop's dictionary of a user.
For example:Based on part personal desktop's file of the user shown in Fig. 5, we can be carried out point to filename therein
Word, then as shown in the Table A in Fig. 6, each filename corresponds to each word that it includes to the result after participle.
Second, build keyword synonym figure
By the word in Fig. 9, by online dictionary web sites, its synonym, and the Table A in inquiry Fig. 6 can be obtained,
With the presence or absence of this synonym, if it does, data base can be deposited into, the table B in such as Fig. 6.
According to equation below, the word preference of each word in personal desktop's dictionary can be calculated, wherein
Denominator in formulaThe total number of the word in synonymous phrase is referred to, and molecule wi.Times is referred to together
Each synon number in adopted phrase, as shown in Figure 7.
According to the personal desktop's dictionary obtained by Fig. 9, we can build desktop synonym figure, as shown in figure 8, in this figure
Eliminating those does not have synon word, only remains with synon word.
Herein, we by taking " A paper on indexing dataspace.pdf " file as an example, paper,
Indexing, and dataspace is present in our personal desktop's dictionary, while we can also calculate their word
Preference:(paper, 0.40), (indexing, 0.50), (dataspace, 0.50), such as synonymous phrase (paper,
Article, paper), the number of times occurred in whole user words list is respectively 2,1,2, therefore the preference of paper isTherefore it is (indexing, dataspace, paper) that we obtain file key term vector, i.e., by each
The word preference of word is sorting.
3rd, inquiry
1., if user needs the article that searching keyword is " article ", after user input " article ", first can
Its synonym is inquired about in table B in such as Fig. 6, its synonym " paper " and " paper " is found;
2. character string matching method is utilized, inquires about in filename including " article " in Table A in figure 6, " paper "
The file of " paper ", now can five files of returning result because containing above three in the filename of this five files
One in word;
3. the preference of word is ranked up to these files according to user, obtains result, as shown in Figure 10;
By above-mentioned, the inventive method novelty, with brief and practical, the characteristic easily realized, while can also be big
The big file search time for reducing user, it is easy to user's querying individual desk file, improves the recall rate and accuracy rate of file.
Other advantages and modification can be obviously drawn for the person of ordinary skill of the art.Therefore, have
More extensive areas the invention is not limited in herein shown and described illustrating and exemplary embodiment.Cause
This, in the case of without departing from the spirit and scope of general inventive concept by defined in appended claims and its equivalents,
Various modifications can be made to it.
Claims (1)
1. it is a kind of to be based on synon personal desktop's file search method, it is characterised in that the method includes:
1st, participle is carried out by the filename in the data set that existing participle instrument will be collected by prototype system, while will
Those after participle do not have practical significance, comprising numeral word filter out, then the corresponding word of filename is deposited
Enter data base, as the word list of user;
2nd, filename carries out synon matching after participle using an online dictionary web sites;
The all of word of 2.1st traversal, for each word, as the search word of online dictionary web sites;
2.2nd website can return a queried result website with regard to this word, and the webpage contains the base of the word
This lexical or textual analysis, synonym, near synonym, antisense word information, are believed the synonym of the word and near synonym using web page crawl technology
Breath crawls;
2.3rd for each word in the synonym, near synonym for crawling out, goes to travel through the word row after user's participle
Table, if including these words for crawling out in word list, then just have search word as a pair with its synonym
The word of relation is stored in data base, synonymously table;
3rd, based on the key word of input, using character string matching method, and inquired about with reference to corresponding synonym;
3.1st input will inquire about a key word K of desk file;
3.2nd is inquired about in the synonym table of data base, inquires about the corresponding TongYiCi CiLin S of the key word;
3.3rd using the key word and the synonym found as an inquiry document searching keyword, as set SK;
Each word in 3.4th traversal set SK, in the user words list of data base corresponding filename is inquired about;
3.5th returns Query Result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310128267.4A CN103177122B (en) | 2013-04-15 | 2013-04-15 | Personal desktop document searching method based on synonyms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310128267.4A CN103177122B (en) | 2013-04-15 | 2013-04-15 | Personal desktop document searching method based on synonyms |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103177122A CN103177122A (en) | 2013-06-26 |
CN103177122B true CN103177122B (en) | 2017-04-26 |
Family
ID=48636983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310128267.4A Expired - Fee Related CN103177122B (en) | 2013-04-15 | 2013-04-15 | Personal desktop document searching method based on synonyms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103177122B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105912606A (en) * | 2016-04-05 | 2016-08-31 | 湖南人文科技学院 | Synonym expansion based relational database keyword search method |
CN108108373B (en) | 2016-11-25 | 2020-09-25 | 阿里巴巴集团控股有限公司 | Name matching method and device |
CN112907398A (en) * | 2019-02-20 | 2021-06-04 | 深圳大维理文科技有限公司 | Inventor identification method and inventor identification system |
CN112256822A (en) * | 2020-10-21 | 2021-01-22 | 平安科技(深圳)有限公司 | Text search method and device, computer equipment and storage medium |
CN118227776B (en) * | 2024-05-23 | 2024-07-23 | 四川省肿瘤医院 | Disease science popularization method and system based on artificial intelligence |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0730765B1 (en) * | 1993-11-22 | 2003-09-17 | Lexis-Nexis, A Division Of Reed Elsevier Inc. | Associative text search and retrieval system |
CN101350027A (en) * | 2007-07-19 | 2009-01-21 | 富士胶片株式会社 | Content retrieving device and retrieving method |
CN102722498A (en) * | 2011-03-31 | 2012-10-10 | 北京百度网讯科技有限公司 | Search engine and implementation method thereof |
-
2013
- 2013-04-15 CN CN201310128267.4A patent/CN103177122B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0730765B1 (en) * | 1993-11-22 | 2003-09-17 | Lexis-Nexis, A Division Of Reed Elsevier Inc. | Associative text search and retrieval system |
CN101350027A (en) * | 2007-07-19 | 2009-01-21 | 富士胶片株式会社 | Content retrieving device and retrieving method |
CN102722498A (en) * | 2011-03-31 | 2012-10-10 | 北京百度网讯科技有限公司 | Search engine and implementation method thereof |
Non-Patent Citations (1)
Title |
---|
基于自然语言理解的本体语义信息检索;张宗仁;《CNKI中国优秀硕士学位论文全文数据库》;20111015;第2.3、4.1-4.2、5.2、5.4节,图6-2 * |
Also Published As
Publication number | Publication date |
---|---|
CN103177122A (en) | 2013-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tablan et al. | Mímir: An open-source semantic search framework for interactive information seeking and discovery | |
US8140579B2 (en) | Method and system for subject relevant web page filtering based on navigation paths information | |
CN103886099B (en) | Semantic retrieval system and method of vague concepts | |
CN103177122B (en) | Personal desktop document searching method based on synonyms | |
US9971828B2 (en) | Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries | |
Thangaraj et al. | An architectural design for effective information retrieval in semantic web | |
WO2007132342A1 (en) | Documentary search procedure in a distributed information system | |
Sarda et al. | Mragyati: A system for keyword-based searching in databases | |
Grineva et al. | Blognoon: Exploring a topic in the blogosphere | |
Al-Zoghby et al. | Mining Arabic text using soft-matching association rules | |
Latif et al. | CAF-SIAL: Concept aggregation framework for structuring informational aspects of linked open data | |
Qin et al. | Research on search results optimization technology with category features integration | |
Ganta et al. | Search engine optimization through spanning forest generation algorithm | |
Iyad et al. | Towards supporting exploratory search over the Arabic web content: The case of ArabXplore | |
Selvi et al. | An approach to improve precision and recall for ad-hoc information retrieval using sbir algorithm | |
Yang | The top 40 citation classics in the Journal of the American Society for Information Science and Technology | |
Sharma et al. | Improved stemming approach used for text processing in information retrieval system | |
Zhao et al. | Searching desktop files based on synonym relationship | |
Chun et al. | Semantic annotation and search for deep web services | |
Jayanthi et al. | Referenced attribute Functional Dependency Database for visualizing web relational tables | |
Li et al. | The ontology relation extraction for semantic web annotation | |
CN103514256A (en) | Rationalization proposal full-text retrieval system | |
Sardar et al. | Resource Selection in Federated Web Search | |
Liang et al. | SWARMS: A New Tool for Domain Exploration in Semantic Web | |
Li | An Approach to Semantic Information Retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170426 Termination date: 20210415 |