CN104899262B - A kind of information categorization method for supporting User Defined to sort out rule - Google Patents
A kind of information categorization method for supporting User Defined to sort out rule Download PDFInfo
- Publication number
- CN104899262B CN104899262B CN201510262625.XA CN201510262625A CN104899262B CN 104899262 B CN104899262 B CN 104899262B CN 201510262625 A CN201510262625 A CN 201510262625A CN 104899262 B CN104899262 B CN 104899262B
- Authority
- CN
- China
- Prior art keywords
- rule
- keyword
- information
- relation
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000011218 segmentation Effects 0.000 claims description 16
- 108091081062 Repeated sequence (DNA) Proteins 0.000 claims description 4
- 230000004048 modification Effects 0.000 claims 1
- 238000012986 modification Methods 0.000 claims 1
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000013549 information retrieval technique Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2291—User-Defined Types; Storage management thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to database application field, more particularly to a kind of method of information categorization in database for supporting User Defined to sort out rule, based on the classification rule of user's customization, database is supported to carry out sentence retrieval, obtain the information content close or similar to retrieval sentence, or the information content with potential relation, the inventive method will help user to get more comprehensive information.
Description
Technical field
The invention belongs to database application field, and in particular in a kind of database for supporting User Defined to sort out rule
The method of information categorization.
Background technology
Information categorization refers to for certain purpose, using certain classification principle and method as guidance, according to the information content, property
The demand of matter and correlation, database information is categorizedly organized by certain structural system.
The operation principle of information categorization is:First, information is stored in database, while the key content of information is extracted,
Foundation as classification;Second, sort out rule according to related needs to define;3rd, will be interior in database according to rule is sorted out
Hold similar or similar information to put together.
The technology related to " information categorization " is information retrieval technique, and for database, retrieval is usually defeated according to user
The search key that enters accurately search or fuzzy search, obtains the information to match with retrieval content, and by this information
Return to user.
At present, precise search whether is carried out to database or carries out fuzzy search, use is all based on keyword
Retrieval technique, such retrieval can not obtain, to retrieving the related information content that content is close, similar, also can not obtaining and examining
The related information content of potential relation be present in rope content.
The content of the invention
The purpose of the present invention is exactly to overcome above-mentioned weak point of the prior art, there is provided one kind supports user to make by oneself
Justice sorts out the information categorization method of rule, supports the database retrieval towards sentence, so as to realize to related or close or have
The information categorization of potential relation.
The present invention is a kind of information categorization method for supporting User Defined to sort out rule, with the classification rule of user's customization
Based on, support database to carry out sentence retrieval, obtain the information content close or similar to retrieval sentence, including following step
Suddenly:
(1)Information categorization rule modeling, dependency rule during by for information categorization is described with a figure, every in figure
Individual one key word information of node on behalf, including key words content and keyword weight, each edge in figure represent two keys
Relation information between word, including relation content and relation weight, in concrete operations, with a triple, i.e., subject, predicate,
Object information represents a line in figure, i.e., the relation between two nodes of subject and object is predicate, and user passes through customization
Dependency rule when above-mentioned rule relation figure is to be customized for information categorization;
(2)Rule-based retrieval sentence participle, the rule relation figure customized by traverse user, is obtained in this rule
All keywords, keyword set is formed, after user inputs retrieval sentence, the keyword of matching is found out in keyword set,
Obtain word segmentation result;
(3)Rule-based search key extension, with by step(2)In the word segmentation result obtained after word segmentation processing
Each keyword is acted upon respectively as kernel keyword, under the control of the search number of plies of user's customization, is obtained therewith
Close or related keyword and associated weight, finally obtain expanded keyword collection.In addition, it is contemplated that in rule keyword it
Between incidence relation be figure shape topographical form, therefore in order to improve Reasoning Efficiency, it is necessary to limit the extension number of plies of keyword,
That is the search number of plies of user's customization;
(4)The keyword set obtained using extension, carry out precise search in database or fuzzy search obtains accordingly
Content.According to rule relation figure, related to the kernel keyword handled or similar keyword can be expanded, so
When recycling these keywords further to be retrieved, it is possible to obtain in related to this retrieval sentence or similar information
Hold.Similarly, according to rule relation figure, the key with the kernel keyword handled with potential applications relation can be expanded
Word, when further being retrieved using these keywords, so that it may obtain the letter that there is potential applications relation with this retrieval sentence
Cease content.
The present invention is applied to all kinds of users for having information categorization demand, supports the information categorization rule that user's on-demand customization is related
Then, such user can change dependency rule at any time or formulate new classification rule.The present invention key step be with
Based on the classification rule of family customization, the difference of rule, the operation of retrieval participle and keyword expansion are on the one hand sorted out according to customization
The Different Results that will be obtained are operated, this causes the effect of information categorization to be changed with the customization of rule, on the other hand, uses
Family can sort out rule according to the effect constantly improve of information categorization.Information categorization, resulting classification knot are carried out using the present invention
Fruit is except obtaining with addition to the result of initial retrieval sentence direct correlation, can also obtain or tool related or similar to initial retrieval sentence
There is the result of potential relation, so user will be helped to get more comprehensive information.
Brief description of the drawings
Fig. 1 is the rule-based retrieval sentence segmentation methods flow chart of the present invention.
Fig. 2 is the rule-based keyword expansion algorithm flow chart of the present invention.
Embodiment
When the inventive method is implemented, dependency rule graph of a relation is constructed by step 1, and be deposited into database.Below
Exemplified by realizing the application program of the inventive method under eclipse development environments with Java language on developing engine, specifically
Bright technical solution of the present invention.
Step 1:The modeling of information categorization rule.
The appropriate regular modeling tool of selection, the rule described in graph form is established according to user's request.Information will be used for
Dependency rule during classification is described with a figure, in each node on behalf one key word information, including keyword in figure
Hold and keyword weight, each edge in figure represent the relation information between two keywords, including relation content and relation power
Weight, in concrete operations, with a triple, i.e., subject, predicate, object information represent a line in figure, i.e. subject and guest
Relation between two nodes of language is predicate, phase when user is by customizing above-mentioned rule relation figure to be customized for information categorization
Close rule.
The present embodiment defines a web interface, and rule file is uploaded for user, by parsing the rule file, will
The triplet information deposit database arrived, facilitates subsequent step to use.Obtained triplet information deposit database will be being parsed,
Meanwhile by traveling through these triples, it can obtain a keyword set for being used for subsequent step.
Step 2:Rule-based retrieval sentence participle.
Be with traditional participle program difference, participle operation of the invention be based on the regular of user's customization,
Therefore in different rules, the word segmentation result of same retrieval sentence may be different.
As shown in figure 1, rule-based retrieval sentence segmentation methods are as follows:
Step 1, the character string currently considered is set since subscript i, i=0;
Step 2, since i, if desirable string length is more than or equal to MaxLen, one length of interception is
MaxLen character string CutWord, it is CutWord otherwise to intercept remaining substring, wherein, MaxLen is in regular keyword set
The extreme length of keyword;
Step 3, judge whether CutWord is word in regular keyword set, if it is, CutWord is added to
Word segmentation result collection, step 5 is jumped to, otherwise goes to step 4;
Step 4, if CutWord length is 0, step 5 is gone to, otherwise delete CutWord the last character
Symbol, then goes to step 3;
Step 5, delete the part of matching, i values plus 1, if i has been above or equal to searching character string length, program
Stop, returning to word segmentation result collection, otherwise go to step 2.
Correlated variables implication such as table 1 in above-mentioned rule-based retrieval sentence segmentation methods.
Variable in the rule-based retrieval sentence segmentation methods of table 1.
Variable name | Types of variables | Implication |
CutWord | String | The keyword intercepted out every time from retrieval sentence |
i | int | The original position of interception keyword every time |
MaxLen | int | Length keywords threshold value, length keywords are respectively less than this value |
Step 3:Rule-based search key extension.
This step from database by reading triplet information, and composition rule graph of a relation, then with each keyword
Centered on, search out other associated or similar keywords, and by parsing obtain relation weight therebetween and
All obtained keywords, are finally ranked up by the weight of other related keywords by comprehensive weight.
As shown in Fig. 2 rule-based search key expansion algorithm is as follows:
Step 1, if word segmentation result collection is sky, step 9 is jumped to, otherwise, is taken out a keyword, and delete, jump to step 2;
Step 2, empty and treat expanded keyword collection,Information addsWith spreading result collection, current search number of plies j=2, transposing step three are set;
Step 3, if j exceedes the search number of plies of customization, step 1 is jumped to, otherwise j adds 1, jumps to step
Four;
Step 4, ifCollection be combined into sky, then jump to step 7, otherwise fromIn select a pass
Keyword, and it is deleted, jump to step 5;
Step 5, withCentered on, searched in rule and obtain associated triplet information set, skip to
Step 6;
Step 6, ifFor sky, then step 4 is jumped to, otherwise therefrom select a triplet information,
And delete it.Pass through parsing, obtain withA related keyword, and by parse relation weight and
Weight integrates obtained weight, willInformation, including comprehensive weightGather among one extension of deposit, skip to step 6;
Step 7, removeIn repeat element, ifStep 3 is then skipped to for sky, is otherwise therefrom selected
One keyword, jump to step 8;
Step 8,Add, and judgeWhether it had been expanded, if do not had
Have, thenAdd, jump to step 7;
Step 9, removeIn repeat element, after weight descending sort, returning result, program stopped.
Correlated variables definition such as table 2 in above-mentioned rule-based keyword expansion algorithm.
Variable in the rule-based keyword expansion algorithm of table 2.
Note:AtomWord in table 2 represents key word information, includes the content and weight of keyword.
Tripe in table 2 represents triplet information, i.e.,(Subject, predicate, object).
After keyword expansion result is obtained, precise search or fuzzy is carried out in database using these keywords
Retrieval, you can to obtain retrieval result, finally retrieval result sorts according to the associated weight of keyword.The present invention's
In implementation, user can be regular with the related information categorization of on-demand customization, including newly-built rule and alteration ruler, and in retrieval
When, user can directly retrieve a sentence, be not limited solely to retrieve single keyword, the present invention can be customized with user
Classification rule based on, participle operation is carried out to retrieval sentence, extracted and the relevant search key of classification rule.For
Segment obtained each keyword, the present invention can obtain phase by carrying out keyword expansion in the rule that is customized in user
Pass or other similar keywords, by carrying out database retrieval to these keywords, obtain and user's initial retrieval content
Content similar in correlation.Other keywords that with search key there are potential applications to associate in rule can also be similarly obtained,
Therefore also obtained that there is the potential content contacted with user's initial retrieval content.
Claims (4)
- A kind of 1. information categorization method for supporting User Defined to sort out rule, it is characterised in that this method comprises the following steps:(1)Information categorization rule modeling, dependency rule during by for information categorization are described with a figure, each section in figure Point represents a key word information, including key words content and keyword weight, each edge in figure represent two keywords it Between relation information, including relation content and relation weight, in concrete operations, with a triple, i.e. subject, predicate, object Information represents a line in figure, i.e., the relation between two nodes of subject and object is predicate, and user is above-mentioned by customizing Dependency rule when rule relation figure is to be customized for information categorization;(2)Rule-based retrieval sentence participle, the rule relation figure customized by traverse user, obtain all in this rule Keyword, keyword set is formed, after user inputs retrieval sentence, the keyword of matching is found out in keyword set, is obtained Word segmentation result;(3)Rule-based search key extension, with by step(2)It is each in the word segmentation result obtained after word segmentation processing Individual keyword is acted upon respectively as kernel keyword, under the control of the search number of plies of user's customization, is obtained close therewith Or related keyword and associated weight, finally obtain expanded keyword collection;(4)The keyword set obtained using extension, carries out precise search in database or fuzzy search is obtained in corresponding Hold.
- 2. the information categorization method according to claim 1 for supporting User Defined to sort out rule, it is characterised in that:Step (1)Described in information categorization rule modeling process, including newly-built or modification information sorts out rule, i.e., user can be by new Build a figure or modified on the basis of original figure.
- 3. the information categorization method according to claim 1 for supporting User Defined to sort out rule, it is characterised in that step (2)Described in rule-based retrieval sentence participle process it is as follows:The first step, the character string currently considered is set since subscript i, i=0;Second step, since i, if desirable string length is more than or equal to MaxLen, one length of interception is MaxLen Character string CutWord, otherwise to intercept remaining substring be CutWord, wherein, MaxLen is keyword in regular keyword set Extreme length;3rd step, judge whether CutWord is word in regular keyword set, if it is, CutWord is added into participle Result set, changes to the 5th step, otherwise goes to the 4th step;4th step, if CutWord length is 0, the 5th step is gone to, otherwise deletes CutWord last character, Then go to the 3rd step;5th step, delete the part of matching, i values plus 1, if i has been above or stopped equal to searching character string length, program Only, word segmentation result collection is returned, otherwise goes to second step.
- 4. the information categorization method according to claim 1 for supporting User Defined to sort out rule, it is characterised in that step (3)In rule-based search key expansion process it is as follows:The first step, if word segmentation result collection is sky, the 9th step is gone to, otherwise, is taken out a keyword, and delete Remove, go to second step;Second step, empty and treat expanded keyword collection,Information addsWith spreading result collection, current search number of plies j=2 are set, go to the 3rd step;3rd step, if j exceedes the search number of plies of customization, the first step is gone to, otherwise j adds 1, goes to the 4th step;4th step, ifCollection be combined into sky, then go to the 7th step, otherwise fromIn select a keyword, And it is deleted, go to the 5th step;5th step, withCentered on, searched in rule and obtain associated triplet information set, go to the 6th Step;6th step, ifFor sky, then the 4th step is gone to, otherwise therefrom select a triplet information, and delete It, passes through parsing, obtain withA related keyword, and by parse relation weight andWeight is comprehensive Close obtained weight, willInformation, including comprehensive weightGather among one extension of deposit, Go to the 7th step;7th step, removesIn repeat element, ifThe 3rd step is then gone to for sky, otherwise therefrom selects a pass Keyword, go to the 8th step;8th step,Add, and judgeWhether it had been expanded, if it is not, handleAdd, go to the 7th step;9th step, removesIn repeat element, after weight descending sort, returning result, program stopped.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510262625.XA CN104899262B (en) | 2015-05-22 | 2015-05-22 | A kind of information categorization method for supporting User Defined to sort out rule |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510262625.XA CN104899262B (en) | 2015-05-22 | 2015-05-22 | A kind of information categorization method for supporting User Defined to sort out rule |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104899262A CN104899262A (en) | 2015-09-09 |
CN104899262B true CN104899262B (en) | 2017-12-22 |
Family
ID=54031925
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510262625.XA Expired - Fee Related CN104899262B (en) | 2015-05-22 | 2015-05-22 | A kind of information categorization method for supporting User Defined to sort out rule |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104899262B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106815207B (en) * | 2015-12-01 | 2020-08-11 | 北京国双科技有限公司 | Information processing method and device for legal referee document |
CN106126545A (en) * | 2016-06-15 | 2016-11-16 | 北京智能管家科技有限公司 | Distributed fission querying method and device |
CN107577779A (en) * | 2017-09-13 | 2018-01-12 | 陕西铺铺旺数字科技有限公司 | Method and device based on querying condition weight proportion inquiry data groups |
CN108717853B (en) * | 2018-05-09 | 2020-11-20 | 深圳艾比仿生机器人科技有限公司 | Man-machine voice interaction method, device and storage medium |
CN117851614B (en) * | 2024-03-04 | 2024-05-14 | 创意信息技术股份有限公司 | Searching method, device and system for mass data and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
CN103164415A (en) * | 2011-12-09 | 2013-06-19 | 富士通株式会社 | Expansion keyword obtaining method based on microblog platform and equipment |
CN103377226A (en) * | 2012-04-25 | 2013-10-30 | 中国移动通信集团公司 | Intelligent search method and system thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8036937B2 (en) * | 2005-12-21 | 2011-10-11 | Ebay Inc. | Computer-implemented method and system for enabling the automated selection of keywords for rapid keyword portfolio expansion |
-
2015
- 2015-05-22 CN CN201510262625.XA patent/CN104899262B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
CN103164415A (en) * | 2011-12-09 | 2013-06-19 | 富士通株式会社 | Expansion keyword obtaining method based on microblog platform and equipment |
CN103377226A (en) * | 2012-04-25 | 2013-10-30 | 中国移动通信集团公司 | Intelligent search method and system thereof |
Also Published As
Publication number | Publication date |
---|---|
CN104899262A (en) | 2015-09-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104899262B (en) | A kind of information categorization method for supporting User Defined to sort out rule | |
CN108647276B (en) | Searching method | |
Lin et al. | A fast algorithm for mining fuzzy frequent itemsets | |
CN101169780A (en) | Semantic ontology retrieval system and method | |
CN104239513A (en) | Semantic retrieval method oriented to field data | |
CN103440314A (en) | Semantic retrieval method based on Ontology | |
CN106547864A (en) | A kind of Personalized search based on query expansion | |
CN103324700A (en) | Noumenon concept attribute learning method based on Web information | |
CN107239512A (en) | The microblogging comment spam recognition methods of relational network figure is commented in a kind of combination | |
CN105404677B (en) | A kind of search method based on tree structure | |
CN108228656B (en) | URL classification method and device based on CART decision tree | |
CN103226601B (en) | A kind of method and apparatus of picture searching | |
CN105389328B (en) | A kind of extensive open source software searching order optimization method | |
CN103279492A (en) | Method and device for catching webpage | |
Setayesh et al. | Presentation of an Extended Version of the PageRank Algorithm to Rank Web Pages Inspired by Ant Colony Algorithm | |
Shekhar et al. | An architectural framework of a crawler for retrieving highly relevant web documents by filtering replicated web collections | |
WO2012091541A1 (en) | A semantic web constructor system and a method thereof | |
CN105426490B (en) | A kind of indexing means based on tree structure | |
Annam et al. | Entropy based informative content density approach for efficient web content extraction | |
Dai et al. | Search Engine System Based on Ontology of Technological Resources. | |
CN102339292A (en) | Distributed searching method and system | |
Manuja et al. | Intelligent text classification system based on self-administered ontology | |
CN109241124A (en) | A kind of method and system of quick-searching similar character string | |
Thenmalar et al. | The modified concept based focused crawling using ontology | |
Maw | An improvement of FP-growth mining algorithm using linked list |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20171222 |