CN104899262B - A kind of information categorization method for supporting User Defined to sort out rule - Google Patents

A kind of information categorization method for supporting User Defined to sort out rule Download PDF

Info

Publication number
CN104899262B
CN104899262B CN201510262625.XA CN201510262625A CN104899262B CN 104899262 B CN104899262 B CN 104899262B CN 201510262625 A CN201510262625 A CN 201510262625A CN 104899262 B CN104899262 B CN 104899262B
Authority
CN
China
Prior art keywords
rule
keyword
information
relation
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510262625.XA
Other languages
Chinese (zh)
Other versions
CN104899262A (en
Inventor
叶俊民
祝黄建
叶竹君
陈曙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Normal University
Original Assignee
Huazhong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Normal University filed Critical Huazhong Normal University
Priority to CN201510262625.XA priority Critical patent/CN104899262B/en
Publication of CN104899262A publication Critical patent/CN104899262A/en
Application granted granted Critical
Publication of CN104899262B publication Critical patent/CN104899262B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to database application field, more particularly to a kind of method of information categorization in database for supporting User Defined to sort out rule, based on the classification rule of user's customization, database is supported to carry out sentence retrieval, obtain the information content close or similar to retrieval sentence, or the information content with potential relation, the inventive method will help user to get more comprehensive information.

Description

A kind of information categorization method for supporting User Defined to sort out rule
Technical field
The invention belongs to database application field, and in particular in a kind of database for supporting User Defined to sort out rule The method of information categorization.
Background technology
Information categorization refers to for certain purpose, using certain classification principle and method as guidance, according to the information content, property The demand of matter and correlation, database information is categorizedly organized by certain structural system.
The operation principle of information categorization is:First, information is stored in database, while the key content of information is extracted, Foundation as classification;Second, sort out rule according to related needs to define;3rd, will be interior in database according to rule is sorted out Hold similar or similar information to put together.
The technology related to " information categorization " is information retrieval technique, and for database, retrieval is usually defeated according to user The search key that enters accurately search or fuzzy search, obtains the information to match with retrieval content, and by this information Return to user.
At present, precise search whether is carried out to database or carries out fuzzy search, use is all based on keyword Retrieval technique, such retrieval can not obtain, to retrieving the related information content that content is close, similar, also can not obtaining and examining The related information content of potential relation be present in rope content.
The content of the invention
The purpose of the present invention is exactly to overcome above-mentioned weak point of the prior art, there is provided one kind supports user to make by oneself Justice sorts out the information categorization method of rule, supports the database retrieval towards sentence, so as to realize to related or close or have The information categorization of potential relation.
The present invention is a kind of information categorization method for supporting User Defined to sort out rule, with the classification rule of user's customization Based on, support database to carry out sentence retrieval, obtain the information content close or similar to retrieval sentence, including following step Suddenly:
(1)Information categorization rule modeling, dependency rule during by for information categorization is described with a figure, every in figure Individual one key word information of node on behalf, including key words content and keyword weight, each edge in figure represent two keys Relation information between word, including relation content and relation weight, in concrete operations, with a triple, i.e., subject, predicate, Object information represents a line in figure, i.e., the relation between two nodes of subject and object is predicate, and user passes through customization Dependency rule when above-mentioned rule relation figure is to be customized for information categorization;
(2)Rule-based retrieval sentence participle, the rule relation figure customized by traverse user, is obtained in this rule All keywords, keyword set is formed, after user inputs retrieval sentence, the keyword of matching is found out in keyword set, Obtain word segmentation result;
(3)Rule-based search key extension, with by step(2)In the word segmentation result obtained after word segmentation processing Each keyword is acted upon respectively as kernel keyword, under the control of the search number of plies of user's customization, is obtained therewith Close or related keyword and associated weight, finally obtain expanded keyword collection.In addition, it is contemplated that in rule keyword it Between incidence relation be figure shape topographical form, therefore in order to improve Reasoning Efficiency, it is necessary to limit the extension number of plies of keyword, That is the search number of plies of user's customization;
(4)The keyword set obtained using extension, carry out precise search in database or fuzzy search obtains accordingly Content.According to rule relation figure, related to the kernel keyword handled or similar keyword can be expanded, so When recycling these keywords further to be retrieved, it is possible to obtain in related to this retrieval sentence or similar information Hold.Similarly, according to rule relation figure, the key with the kernel keyword handled with potential applications relation can be expanded Word, when further being retrieved using these keywords, so that it may obtain the letter that there is potential applications relation with this retrieval sentence Cease content.
The present invention is applied to all kinds of users for having information categorization demand, supports the information categorization rule that user's on-demand customization is related Then, such user can change dependency rule at any time or formulate new classification rule.The present invention key step be with Based on the classification rule of family customization, the difference of rule, the operation of retrieval participle and keyword expansion are on the one hand sorted out according to customization The Different Results that will be obtained are operated, this causes the effect of information categorization to be changed with the customization of rule, on the other hand, uses Family can sort out rule according to the effect constantly improve of information categorization.Information categorization, resulting classification knot are carried out using the present invention Fruit is except obtaining with addition to the result of initial retrieval sentence direct correlation, can also obtain or tool related or similar to initial retrieval sentence There is the result of potential relation, so user will be helped to get more comprehensive information.
Brief description of the drawings
Fig. 1 is the rule-based retrieval sentence segmentation methods flow chart of the present invention.
Fig. 2 is the rule-based keyword expansion algorithm flow chart of the present invention.
Embodiment
When the inventive method is implemented, dependency rule graph of a relation is constructed by step 1, and be deposited into database.Below Exemplified by realizing the application program of the inventive method under eclipse development environments with Java language on developing engine, specifically Bright technical solution of the present invention.
Step 1:The modeling of information categorization rule.
The appropriate regular modeling tool of selection, the rule described in graph form is established according to user's request.Information will be used for Dependency rule during classification is described with a figure, in each node on behalf one key word information, including keyword in figure Hold and keyword weight, each edge in figure represent the relation information between two keywords, including relation content and relation power Weight, in concrete operations, with a triple, i.e., subject, predicate, object information represent a line in figure, i.e. subject and guest Relation between two nodes of language is predicate, phase when user is by customizing above-mentioned rule relation figure to be customized for information categorization Close rule.
The present embodiment defines a web interface, and rule file is uploaded for user, by parsing the rule file, will The triplet information deposit database arrived, facilitates subsequent step to use.Obtained triplet information deposit database will be being parsed, Meanwhile by traveling through these triples, it can obtain a keyword set for being used for subsequent step.
Step 2:Rule-based retrieval sentence participle.
Be with traditional participle program difference, participle operation of the invention be based on the regular of user's customization, Therefore in different rules, the word segmentation result of same retrieval sentence may be different.
As shown in figure 1, rule-based retrieval sentence segmentation methods are as follows:
Step 1, the character string currently considered is set since subscript i, i=0;
Step 2, since i, if desirable string length is more than or equal to MaxLen, one length of interception is MaxLen character string CutWord, it is CutWord otherwise to intercept remaining substring, wherein, MaxLen is in regular keyword set The extreme length of keyword;
Step 3, judge whether CutWord is word in regular keyword set, if it is, CutWord is added to Word segmentation result collection, step 5 is jumped to, otherwise goes to step 4;
Step 4, if CutWord length is 0, step 5 is gone to, otherwise delete CutWord the last character Symbol, then goes to step 3;
Step 5, delete the part of matching, i values plus 1, if i has been above or equal to searching character string length, program Stop, returning to word segmentation result collection, otherwise go to step 2.
Correlated variables implication such as table 1 in above-mentioned rule-based retrieval sentence segmentation methods.
Variable in the rule-based retrieval sentence segmentation methods of table 1.
Variable name Types of variables Implication
CutWord String The keyword intercepted out every time from retrieval sentence
i int The original position of interception keyword every time
MaxLen int Length keywords threshold value, length keywords are respectively less than this value
Step 3:Rule-based search key extension.
This step from database by reading triplet information, and composition rule graph of a relation, then with each keyword Centered on, search out other associated or similar keywords, and by parsing obtain relation weight therebetween and All obtained keywords, are finally ranked up by the weight of other related keywords by comprehensive weight.
As shown in Fig. 2 rule-based search key expansion algorithm is as follows:
Step 1, if word segmentation result collection is sky, step 9 is jumped to, otherwise, is taken out a keyword, and delete, jump to step 2;
Step 2, empty and treat expanded keyword collection,Information addsWith spreading result collection, current search number of plies j=2, transposing step three are set;
Step 3, if j exceedes the search number of plies of customization, step 1 is jumped to, otherwise j adds 1, jumps to step Four;
Step 4, ifCollection be combined into sky, then jump to step 7, otherwise fromIn select a pass Keyword, and it is deleted, jump to step 5;
Step 5, withCentered on, searched in rule and obtain associated triplet information set, skip to Step 6;
Step 6, ifFor sky, then step 4 is jumped to, otherwise therefrom select a triplet information, And delete it.Pass through parsing, obtain withA related keyword, and by parse relation weight and Weight integrates obtained weight, willInformation, including comprehensive weightGather among one extension of deposit, skip to step 6;
Step 7, removeIn repeat element, ifStep 3 is then skipped to for sky, is otherwise therefrom selected One keyword, jump to step 8;
Step 8,Add, and judgeWhether it had been expanded, if do not had Have, thenAdd, jump to step 7;
Step 9, removeIn repeat element, after weight descending sort, returning result, program stopped.
Correlated variables definition such as table 2 in above-mentioned rule-based keyword expansion algorithm.
Variable in the rule-based keyword expansion algorithm of table 2.
Note:AtomWord in table 2 represents key word information, includes the content and weight of keyword.
Tripe in table 2 represents triplet information, i.e.,(Subject, predicate, object).
After keyword expansion result is obtained, precise search or fuzzy is carried out in database using these keywords Retrieval, you can to obtain retrieval result, finally retrieval result sorts according to the associated weight of keyword.The present invention's In implementation, user can be regular with the related information categorization of on-demand customization, including newly-built rule and alteration ruler, and in retrieval When, user can directly retrieve a sentence, be not limited solely to retrieve single keyword, the present invention can be customized with user Classification rule based on, participle operation is carried out to retrieval sentence, extracted and the relevant search key of classification rule.For Segment obtained each keyword, the present invention can obtain phase by carrying out keyword expansion in the rule that is customized in user Pass or other similar keywords, by carrying out database retrieval to these keywords, obtain and user's initial retrieval content Content similar in correlation.Other keywords that with search key there are potential applications to associate in rule can also be similarly obtained, Therefore also obtained that there is the potential content contacted with user's initial retrieval content.

Claims (4)

  1. A kind of 1. information categorization method for supporting User Defined to sort out rule, it is characterised in that this method comprises the following steps:
    (1)Information categorization rule modeling, dependency rule during by for information categorization are described with a figure, each section in figure Point represents a key word information, including key words content and keyword weight, each edge in figure represent two keywords it Between relation information, including relation content and relation weight, in concrete operations, with a triple, i.e. subject, predicate, object Information represents a line in figure, i.e., the relation between two nodes of subject and object is predicate, and user is above-mentioned by customizing Dependency rule when rule relation figure is to be customized for information categorization;
    (2)Rule-based retrieval sentence participle, the rule relation figure customized by traverse user, obtain all in this rule Keyword, keyword set is formed, after user inputs retrieval sentence, the keyword of matching is found out in keyword set, is obtained Word segmentation result;
    (3)Rule-based search key extension, with by step(2)It is each in the word segmentation result obtained after word segmentation processing Individual keyword is acted upon respectively as kernel keyword, under the control of the search number of plies of user's customization, is obtained close therewith Or related keyword and associated weight, finally obtain expanded keyword collection;
    (4)The keyword set obtained using extension, carries out precise search in database or fuzzy search is obtained in corresponding Hold.
  2. 2. the information categorization method according to claim 1 for supporting User Defined to sort out rule, it is characterised in that:Step (1)Described in information categorization rule modeling process, including newly-built or modification information sorts out rule, i.e., user can be by new Build a figure or modified on the basis of original figure.
  3. 3. the information categorization method according to claim 1 for supporting User Defined to sort out rule, it is characterised in that step (2)Described in rule-based retrieval sentence participle process it is as follows:
    The first step, the character string currently considered is set since subscript i, i=0;
    Second step, since i, if desirable string length is more than or equal to MaxLen, one length of interception is MaxLen Character string CutWord, otherwise to intercept remaining substring be CutWord, wherein, MaxLen is keyword in regular keyword set Extreme length;
    3rd step, judge whether CutWord is word in regular keyword set, if it is, CutWord is added into participle Result set, changes to the 5th step, otherwise goes to the 4th step;
    4th step, if CutWord length is 0, the 5th step is gone to, otherwise deletes CutWord last character, Then go to the 3rd step;
    5th step, delete the part of matching, i values plus 1, if i has been above or stopped equal to searching character string length, program Only, word segmentation result collection is returned, otherwise goes to second step.
  4. 4. the information categorization method according to claim 1 for supporting User Defined to sort out rule, it is characterised in that step (3)In rule-based search key expansion process it is as follows:
    The first step, if word segmentation result collection is sky, the 9th step is gone to, otherwise, is taken out a keyword, and delete Remove, go to second step;
    Second step, empty and treat expanded keyword collection,Information addsWith spreading result collection, current search number of plies j=2 are set, go to the 3rd step;
    3rd step, if j exceedes the search number of plies of customization, the first step is gone to, otherwise j adds 1, goes to the 4th step;
    4th step, ifCollection be combined into sky, then go to the 7th step, otherwise fromIn select a keyword, And it is deleted, go to the 5th step;
    5th step, withCentered on, searched in rule and obtain associated triplet information set, go to the 6th Step;
    6th step, ifFor sky, then the 4th step is gone to, otherwise therefrom select a triplet information, and delete It, passes through parsing, obtain withA related keyword, and by parse relation weight andWeight is comprehensive Close obtained weight, willInformation, including comprehensive weightGather among one extension of deposit, Go to the 7th step;
    7th step, removesIn repeat element, ifThe 3rd step is then gone to for sky, otherwise therefrom selects a pass Keyword, go to the 8th step;
    8th step,Add, and judgeWhether it had been expanded, if it is not, handleAdd, go to the 7th step;
    9th step, removesIn repeat element, after weight descending sort, returning result, program stopped.
CN201510262625.XA 2015-05-22 2015-05-22 A kind of information categorization method for supporting User Defined to sort out rule Expired - Fee Related CN104899262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510262625.XA CN104899262B (en) 2015-05-22 2015-05-22 A kind of information categorization method for supporting User Defined to sort out rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510262625.XA CN104899262B (en) 2015-05-22 2015-05-22 A kind of information categorization method for supporting User Defined to sort out rule

Publications (2)

Publication Number Publication Date
CN104899262A CN104899262A (en) 2015-09-09
CN104899262B true CN104899262B (en) 2017-12-22

Family

ID=54031925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510262625.XA Expired - Fee Related CN104899262B (en) 2015-05-22 2015-05-22 A kind of information categorization method for supporting User Defined to sort out rule

Country Status (1)

Country Link
CN (1) CN104899262B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815207B (en) * 2015-12-01 2020-08-11 北京国双科技有限公司 Information processing method and device for legal referee document
CN106126545A (en) * 2016-06-15 2016-11-16 北京智能管家科技有限公司 Distributed fission querying method and device
CN107577779A (en) * 2017-09-13 2018-01-12 陕西铺铺旺数字科技有限公司 Method and device based on querying condition weight proportion inquiry data groups
CN108717853B (en) * 2018-05-09 2020-11-20 深圳艾比仿生机器人科技有限公司 Man-machine voice interaction method, device and storage medium
CN117851614B (en) * 2024-03-04 2024-05-14 创意信息技术股份有限公司 Searching method, device and system for mass data and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN103164415A (en) * 2011-12-09 2013-06-19 富士通株式会社 Expansion keyword obtaining method based on microblog platform and equipment
CN103377226A (en) * 2012-04-25 2013-10-30 中国移动通信集团公司 Intelligent search method and system thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8036937B2 (en) * 2005-12-21 2011-10-11 Ebay Inc. Computer-implemented method and system for enabling the automated selection of keywords for rapid keyword portfolio expansion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN103164415A (en) * 2011-12-09 2013-06-19 富士通株式会社 Expansion keyword obtaining method based on microblog platform and equipment
CN103377226A (en) * 2012-04-25 2013-10-30 中国移动通信集团公司 Intelligent search method and system thereof

Also Published As

Publication number Publication date
CN104899262A (en) 2015-09-09

Similar Documents

Publication Publication Date Title
CN104899262B (en) A kind of information categorization method for supporting User Defined to sort out rule
CN108647276B (en) Searching method
Lin et al. A fast algorithm for mining fuzzy frequent itemsets
CN101169780A (en) Semantic ontology retrieval system and method
CN104239513A (en) Semantic retrieval method oriented to field data
CN103440314A (en) Semantic retrieval method based on Ontology
CN106547864A (en) A kind of Personalized search based on query expansion
CN103324700A (en) Noumenon concept attribute learning method based on Web information
CN107239512A (en) The microblogging comment spam recognition methods of relational network figure is commented in a kind of combination
CN105404677B (en) A kind of search method based on tree structure
CN108228656B (en) URL classification method and device based on CART decision tree
CN103226601B (en) A kind of method and apparatus of picture searching
CN105389328B (en) A kind of extensive open source software searching order optimization method
CN103279492A (en) Method and device for catching webpage
Setayesh et al. Presentation of an Extended Version of the PageRank Algorithm to Rank Web Pages Inspired by Ant Colony Algorithm
Shekhar et al. An architectural framework of a crawler for retrieving highly relevant web documents by filtering replicated web collections
WO2012091541A1 (en) A semantic web constructor system and a method thereof
CN105426490B (en) A kind of indexing means based on tree structure
Annam et al. Entropy based informative content density approach for efficient web content extraction
Dai et al. Search Engine System Based on Ontology of Technological Resources.
CN102339292A (en) Distributed searching method and system
Manuja et al. Intelligent text classification system based on self-administered ontology
CN109241124A (en) A kind of method and system of quick-searching similar character string
Thenmalar et al. The modified concept based focused crawling using ontology
Maw An improvement of FP-growth mining algorithm using linked list

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171222