CN103177122B - Personal desktop document searching method based on synonyms - Google Patents

Personal desktop document searching method based on synonyms Download PDF

Info

Publication number
CN103177122B
CN103177122B CN201310128267.4A CN201310128267A CN103177122B CN 103177122 B CN103177122 B CN 103177122B CN 201310128267 A CN201310128267 A CN 201310128267A CN 103177122 B CN103177122 B CN 103177122B
Authority
CN
China
Prior art keywords
word
synonym
personal
file
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310128267.4A
Other languages
Chinese (zh)
Other versions
CN103177122A (en
Inventor
李玉坤
赵喜燕
赵德新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN201310128267.4A priority Critical patent/CN103177122B/en
Publication of CN103177122A publication Critical patent/CN103177122A/en
Application granted granted Critical
Publication of CN103177122B publication Critical patent/CN103177122B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a personal document searching method based on synonyms. The personal document searching method comprises the following steps of: carrying out word segmentation on document names with concentrated data by a conventional tokenizer, carrying out synonym matching by using an online dictionary website after word segmentation, and extracting synonym and near-synonym information of words returned by the online dictionary website to obtain a user-personalized synonym table by using a webpage gathering technology; and then based on input keywords by using a character string matching method and combining the corresponding synonym table, a document containing a searching word or the synonyms of the searching word is used as a searching result to be returned, and sorting is conducted based on the preference degree of a user on contained words of a document name. According to the personal document searching method, personal desktop files and the synonyms are combined; the solution is put forward specific to the query problem of the files in personnel data management; the personal document searching method has the characteristics of being concise and practical and easy in realization; and simultaneously, according to the personal document searching method, the file searching time of users can be greatly reduced, the users can inquire the personal desktop files conveniently, and the recall rate and the accuracy rate of the files are improved.

Description

One kind is based on synon personal desktop's file search method
Technical field
The present invention relates to personal information management field, more particularly, to a kind of synon personal document's searcher is based on Method.
Background technology
The development of digitizing technique and web makes the quantity of information that people are processed daily increase severely, and the attention of people and can use Time in data management is but basically unchanged, therefore personal data space management is increasingly becoming an important research and asks Topic.The generalized definition of personal information management is exactly both to have included the management to individual memory information, is also included to external information Management.As the development of information technology, the species of information resources, form are more and more, the side of traditionally on paper information is formerly used for Method is no longer suitable for, the method for needing to probe into information management automation in terms of collection, arrangement, tissue, retrieval etc..Meanwhile, working as Under, the popularization of PC greatly strengthen the ability of people's process and management information.Personal information management is led in many subjects Domain is developed, including man-machine interaction, data base administration, information retrieval, information science etc..
At present, personal desktop's document retrieval method that people commonly use has certain limitation.With modern information technologies With the development of the Internet, in magnanimity growth, on the other hand, store the price of equipment becomes lower to information, and user is more prone to Jumbo storage device is bought to store more personal data, but user wants to search for oneself in the data of this magnanimity Useful information, needs to take a long time.
It is the most-often used side of current people's management and querying individual desk file based on the resource browser of file system Formula.People are browsed by bibliographic structure, find required data file.This method has following limitation:For one A little files not used for a long time, user tends not to remember the accurate location of file storage, it may be necessary to carry out many Secondary trial can just find required file, so as to lose time.Sometimes required file cannot even be found.
WDS is also a kind of method of lookup personal desktop file commonly used at present.Such as Google, Microsoft etc. There is the desktop searching tool of oneself.The core of WDS technology is by setting up full-text index to desk file, so as to prop up Hold the file that user is needed by keyword search.This method has following limitation:One is that some do not have for a long time in lookup When having the file for using, user tends not to accurately remember required keyword;Two is that this mode can not be supported Based on synon inquiry;Three is that full-text index also tends to cause than relatively low efficiency.
Existing personal data querying method has respective limitation.Praxiology research shows:The note of main object Recall with certain regularity.This regularity shows many aspects.For example, memory of the main body to filename can over time Gradually weaken;For the document entity that long-time is not accessed, user often forgets its storage location, and simply fuzzy Certain key word included in the filename for remembering it etc., present desktop searching tool is simply according to the mode of string matching Inquired about, and some desktop searching tools (such as desktop searching tool of Microsoft) are needed in whole file system (including being The installation file of system) in inquired about.This inquiry mode will not only spend longer query time, and and search keyword Similar filename is not checked out.
Entering line retrieval to file based on synonym can improve search efficiency and recall rate, and the present invention is aiming at this and asks Topic.
The content of the invention
The present invention seeks to overcome the problems referred to above that prior art is present, propose that one kind is searched based on synon personal document Suo Fangfa, the present invention is based upon a prototype system of inventor's research and development and user's desktop behavior is monitored, and collects big Measure data and propose after being analyzed, mainly for user is solved for the file that long-time is not accessed, because not remembering it clearly Storage location and accurate keyword message and cannot effective query problem.Such as when user needs inquiry one to read in the past Cross and store on a personal computer with regard to index article when, may use when naming file originally due to the user With " paper ", " article ", it is also possible to use English Paper or Article.User need inquire about this article when, just Need repeatedly to be attempted using several keywords, so as to delay many times.Therefore synon inquiry is based on, can be solved This problem.
The present invention is directed to the problem of management of file in personal computer, on the basis of based on keyword query, it is considered to look into Ask the synonym relation of keyword so that when traditional desktop searching tool is inquired about based on string matching, extend its character string Matching range, the present invention provide is included based on synon personal document's searching method concrete steps:
1st, participle is carried out by the filename in the data set that existing participle instrument will be collected by prototype system, together When by those after participle do not have practical significance, the word comprising numeral filter out, then by word that filename is corresponding Language is stored in data base, used as the word list (such as the Table A in Fig. 6) of user;
2nd, filename carries out synon matching after participle, and when synon matching is carried out, we utilize one Individual online dictionary web sites are operated;
The all of word of 2.1st traversal, for each word, as the search word of online dictionary web sites;
2.2nd website can return a queried result website with regard to this word, it comprises the base of the word The information such as this lexical or textual analysis, synonym, near synonym, antonym, using web page crawl technology by the synonym and near synonym of the word Information crawler gets off;
2.3rd for each word in the synonym, near synonym for crawling out, after removing the participle for traveling through the user Word list (such as the Table A in Fig. 6), if including these words for crawling out in word list, then just can be by it As a pair related words, it is stored in data base, synonymously table (such as the table B in Fig. 6);
3rd, based on the key word of input, using character string matching method, and inquired about with reference to corresponding synonym;
3.1st input will inquire about a key word K of desk file;
Inquired about in 3.2nd table B in figure 6, inquired about the corresponding TongYiCi CiLin S of this key word;
3.3rd using the key word and the synonym found as an inquiry document searching keyword, as set SK;
Each word in 3.4th traversal set SK, inquires about its right in the word list (Table A in such as Fig. 6) of user The filename answered;
3.5th returns Query Result (as shown in Figure 10).
The advantages of the present invention:
The present invention combines personal desktop's file with synonym, and the inquiry for file in personal data management is asked Topic, proposes solution, and the method has unique creativeness, and the method had both been desirably integrated into existing search engine etc. Personal organiser, it is also possible to which the technology is used in META Search Engine.
The inventive method is novel, with brief and practical, the characteristic easily realized, while the text of user can also be greatly reduced Part search time, it is easy to user's querying individual desk file, improves the recall rate and accuracy rate of file.
Description of the drawings
Fig. 1 is block diagram of the present invention based on synon personal document's searching method;
Fig. 2 is the particular flow sheet of the filename participle step of the present invention;
Fig. 3 is the particular flow sheet of the structure synonym figure step of the present invention;
Fig. 4 is the particular flow sheet of the query steps of the present invention;
Fig. 5 is the displaying figure of a part of data of user in data set used in the present invention;
Fig. 6 is to carry out the result (Table A) after participle for the filename in Fig. 5, and stores its corresponding synon data Table (table B);
Fig. 7 is the result of calculation figure that word preference is carried out for the word after file status word in Fig. 5;
Fig. 8 is the synonym figure constructed by Fig. 7;
Fig. 9 is the word after filename participle and number of times statistics;
Figure 10 is the Search Results in embodiment.
In order to the present invention and its advantage is more fully understood, below in conjunction with the accompanying drawings and specific embodiment does into one to the present invention Step is explained.
Specific embodiment
Several concepts according to the present invention
Personal desktop's file (Personal Desktop File):
Personal desktop's file refers to the file that user accesses in PC, not including system file, for example, a text Shelves, picture etc. can be regarded as personal desktop's file.
Personal desktop's dictionary (Personal Desktop Vocabulary):
Personal desktop's dictionary refers to the set of words for being included in filename in personal desktop's file, except those include number Word, the word without practical significance.
Word preference (Word Preference Degree):
Word preference referred in the name of the filename of whole personal desktop's file, the access times of word.
Desktop synonym figure (Desktop Synonym Graph):
The node of desktop synonym figure refers to word of the filename of personal desktop's file after participle and passes through The synonym of online dictionary web sites inquiry, it is synonym relation that the side of desktop synonym figure refers to two nodes.
Document keyword vector (File Keyword Vector):
Document keyword vector refers to the vector that the word that the filename of a file includes is constituted.
Embodiment 1
Below we are illustrated based on synon personal document's searching method with an example, and above concept is entered The explanation of row example.
First, filename participle
For the file set in Fig. 5, we can obtain corresponding each word of filename after participle instrument, Simultaneously we can also count the number of times of its appearance, as shown in figure 9, this represent personal desktop's dictionary of a user.
For example:Based on part personal desktop's file of the user shown in Fig. 5, we can be carried out point to filename therein Word, then as shown in the Table A in Fig. 6, each filename corresponds to each word that it includes to the result after participle.
Second, build keyword synonym figure
By the word in Fig. 9, by online dictionary web sites, its synonym, and the Table A in inquiry Fig. 6 can be obtained, With the presence or absence of this synonym, if it does, data base can be deposited into, the table B in such as Fig. 6.
According to equation below, the word preference of each word in personal desktop's dictionary can be calculated, wherein Denominator in formulaThe total number of the word in synonymous phrase is referred to, and molecule wi.Times is referred to together Each synon number in adopted phrase, as shown in Figure 7.
According to the personal desktop's dictionary obtained by Fig. 9, we can build desktop synonym figure, as shown in figure 8, in this figure Eliminating those does not have synon word, only remains with synon word.
Herein, we by taking " A paper on indexing dataspace.pdf " file as an example, paper, Indexing, and dataspace is present in our personal desktop's dictionary, while we can also calculate their word Preference:(paper, 0.40), (indexing, 0.50), (dataspace, 0.50), such as synonymous phrase (paper, Article, paper), the number of times occurred in whole user words list is respectively 2,1,2, therefore the preference of paper isTherefore it is (indexing, dataspace, paper) that we obtain file key term vector, i.e., by each The word preference of word is sorting.
3rd, inquiry
1., if user needs the article that searching keyword is " article ", after user input " article ", first can Its synonym is inquired about in table B in such as Fig. 6, its synonym " paper " and " paper " is found;
2. character string matching method is utilized, inquires about in filename including " article " in Table A in figure 6, " paper " The file of " paper ", now can five files of returning result because containing above three in the filename of this five files One in word;
3. the preference of word is ranked up to these files according to user, obtains result, as shown in Figure 10;
By above-mentioned, the inventive method novelty, with brief and practical, the characteristic easily realized, while can also be big The big file search time for reducing user, it is easy to user's querying individual desk file, improves the recall rate and accuracy rate of file.
Other advantages and modification can be obviously drawn for the person of ordinary skill of the art.Therefore, have More extensive areas the invention is not limited in herein shown and described illustrating and exemplary embodiment.Cause This, in the case of without departing from the spirit and scope of general inventive concept by defined in appended claims and its equivalents, Various modifications can be made to it.

Claims (1)

1. it is a kind of to be based on synon personal desktop's file search method, it is characterised in that the method includes:
1st, participle is carried out by the filename in the data set that existing participle instrument will be collected by prototype system, while will Those after participle do not have practical significance, comprising numeral word filter out, then the corresponding word of filename is deposited Enter data base, as the word list of user;
2nd, filename carries out synon matching after participle using an online dictionary web sites;
The all of word of 2.1st traversal, for each word, as the search word of online dictionary web sites;
2.2nd website can return a queried result website with regard to this word, and the webpage contains the base of the word This lexical or textual analysis, synonym, near synonym, antisense word information, are believed the synonym of the word and near synonym using web page crawl technology Breath crawls;
2.3rd for each word in the synonym, near synonym for crawling out, goes to travel through the word row after user's participle Table, if including these words for crawling out in word list, then just have search word as a pair with its synonym The word of relation is stored in data base, synonymously table;
3rd, based on the key word of input, using character string matching method, and inquired about with reference to corresponding synonym;
3.1st input will inquire about a key word K of desk file;
3.2nd is inquired about in the synonym table of data base, inquires about the corresponding TongYiCi CiLin S of the key word;
3.3rd using the key word and the synonym found as an inquiry document searching keyword, as set SK;
Each word in 3.4th traversal set SK, in the user words list of data base corresponding filename is inquired about;
3.5th returns Query Result.
CN201310128267.4A 2013-04-15 2013-04-15 Personal desktop document searching method based on synonyms Expired - Fee Related CN103177122B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310128267.4A CN103177122B (en) 2013-04-15 2013-04-15 Personal desktop document searching method based on synonyms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310128267.4A CN103177122B (en) 2013-04-15 2013-04-15 Personal desktop document searching method based on synonyms

Publications (2)

Publication Number Publication Date
CN103177122A CN103177122A (en) 2013-06-26
CN103177122B true CN103177122B (en) 2017-04-26

Family

ID=48636983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310128267.4A Expired - Fee Related CN103177122B (en) 2013-04-15 2013-04-15 Personal desktop document searching method based on synonyms

Country Status (1)

Country Link
CN (1) CN103177122B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912606A (en) * 2016-04-05 2016-08-31 湖南人文科技学院 Synonym expansion based relational database keyword search method
CN108108373B (en) 2016-11-25 2020-09-25 阿里巴巴集团控股有限公司 Name matching method and device
CN112907398A (en) * 2019-02-20 2021-06-04 深圳大维理文科技有限公司 Inventor identification method and inventor identification system
CN112256822A (en) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 Text search method and device, computer equipment and storage medium
CN118227776B (en) * 2024-05-23 2024-07-23 四川省肿瘤医院 Disease science popularization method and system based on artificial intelligence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0730765B1 (en) * 1993-11-22 2003-09-17 Lexis-Nexis, A Division Of Reed Elsevier Inc. Associative text search and retrieval system
CN101350027A (en) * 2007-07-19 2009-01-21 富士胶片株式会社 Content retrieving device and retrieving method
CN102722498A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Search engine and implementation method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0730765B1 (en) * 1993-11-22 2003-09-17 Lexis-Nexis, A Division Of Reed Elsevier Inc. Associative text search and retrieval system
CN101350027A (en) * 2007-07-19 2009-01-21 富士胶片株式会社 Content retrieving device and retrieving method
CN102722498A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Search engine and implementation method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于自然语言理解的本体语义信息检索;张宗仁;《CNKI中国优秀硕士学位论文全文数据库》;20111015;第2.3、4.1-4.2、5.2、5.4节,图6-2 *

Also Published As

Publication number Publication date
CN103177122A (en) 2013-06-26

Similar Documents

Publication Publication Date Title
Tablan et al. Mímir: An open-source semantic search framework for interactive information seeking and discovery
US8140579B2 (en) Method and system for subject relevant web page filtering based on navigation paths information
CN103886099B (en) Semantic retrieval system and method of vague concepts
CN103177122B (en) Personal desktop document searching method based on synonyms
US9971828B2 (en) Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries
Thangaraj et al. An architectural design for effective information retrieval in semantic web
WO2007132342A1 (en) Documentary search procedure in a distributed information system
Sarda et al. Mragyati: A system for keyword-based searching in databases
Grineva et al. Blognoon: Exploring a topic in the blogosphere
Al-Zoghby et al. Mining Arabic text using soft-matching association rules
Latif et al. CAF-SIAL: Concept aggregation framework for structuring informational aspects of linked open data
Qin et al. Research on search results optimization technology with category features integration
Ganta et al. Search engine optimization through spanning forest generation algorithm
Iyad et al. Towards supporting exploratory search over the Arabic web content: The case of ArabXplore
Selvi et al. An approach to improve precision and recall for ad-hoc information retrieval using sbir algorithm
Yang The top 40 citation classics in the Journal of the American Society for Information Science and Technology
Sharma et al. Improved stemming approach used for text processing in information retrieval system
Zhao et al. Searching desktop files based on synonym relationship
Chun et al. Semantic annotation and search for deep web services
Jayanthi et al. Referenced attribute Functional Dependency Database for visualizing web relational tables
Li et al. The ontology relation extraction for semantic web annotation
CN103514256A (en) Rationalization proposal full-text retrieval system
Sardar et al. Resource Selection in Federated Web Search
Liang et al. SWARMS: A New Tool for Domain Exploration in Semantic Web
Li An Approach to Semantic Information Retrieval

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170426

Termination date: 20210415