CN110377754A - A kind of database body learning optimization method based on decision tree - Google Patents

A kind of database body learning optimization method based on decision tree Download PDF

Info

Publication number
CN110377754A
CN110377754A CN201910588441.0A CN201910588441A CN110377754A CN 110377754 A CN110377754 A CN 110377754A CN 201910588441 A CN201910588441 A CN 201910588441A CN 110377754 A CN110377754 A CN 110377754A
Authority
CN
China
Prior art keywords
attribute
decision tree
ontology
source data
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910588441.0A
Other languages
Chinese (zh)
Inventor
刘秀磊
丁嘉伟
刘旭红
张良
曹建制
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN201910588441.0A priority Critical patent/CN110377754A/en
Publication of CN110377754A publication Critical patent/CN110377754A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of database body learning optimization method based on decision tree, comprising the following steps: determine source data attribute list to be sorted;The source data attribute list is trained using preset algorithm, generates attribute decision tree;Attribute decision tree based on generation constructs source data ontology, attribute of the reading database literary name section as class in ontology using OWL Ontology Language;Attribute decision tree based on generation constructs source data ontology class using OWL Ontology Language, reads and respectively selects branch as attribute in decision tree, constructs final decision class.Treated that data effectively can be indicated and be stored with ontology by decision Tree algorithms by the present invention, the ontology directly generated with database study compares, the decision tree generated by database can add more constraint rules for ontology, and the hiding rule of discovery can be used as the filling of existing ontology.In addition, machine can be allowed to replace saving a large amount of human costs by hand using body learning technology.

Description

A kind of database body learning optimization method based on decision tree
Technical field
The present invention relates to a kind of database body learning optimization methods, and in particular to a kind of database sheet based on decision tree Body learns optimization method.
Background technique
The purpose of ontology is to obtain the knowledge of related fields, provides and is commonly understood by domain knowledge, determines in the field Generally accepted vocabulary, and provide from different formalized models the relationship between vocabulary and vocabulary.The building of ontology Extension and reasoning are all based on open world knowledge, i.e. the logic not with objective world is violated, but can be converted to mathematics again simultaneously Semantic relation between symbolic reasoning concept is convenient for computer understanding.
However, many irresistible difficulties allow it to become incomparable difficult when cross-cutting building ontology.Such as: the shape of ontology Formula description is a kind of abstract method, and it embodies mode and relatively obscures, this has resulted in being difficult to qualitative features The case where.This considerably increases by the difficulty of bulk information entitative concept.And the theoretical basis of ontological construction is patrolled by description Volume determining, it describes the inference mechanism of generalities between things, and prove with the mode of mathematics the feasibility, just of reasoning True property, this provides guarantee to the scalability of ontological construction, but increases the difficulty of ontological construction, non-computer at the same time Domain expert is difficult to determine the specific method of description logic when constructing itself related fields ontology, it is difficult to accomplish to use calculating mechanism The language in itself field of the language expression of solution, and field correlation can not be determined when computer field expert building other field ontology Knowledge, it is even more impossible to further generalities ontologies, so that generally require domain expert determines domain knowledge by hand first, determine field Concept, conceptual relation, then by computer field expert by knowledge description be ontology, time-consuming and laborious, cost is excessively high.
It would therefore be highly desirable to need to provide a kind of body constructing method that ontology ability to express is strong and time saving and energy saving.
Summary of the invention
In view of the above technical problems, the present invention is intended to provide a kind of database body learning optimization side based on decision tree Method, the ontology ability to express after this method optimization are strong and time saving and energy saving.
The technical solution adopted by the present invention are as follows:
The embodiment of the present invention provides a kind of database body learning optimization method based on decision tree, comprising the following steps:
Determine source data attribute list to be sorted;
The source data attribute list is trained using preset algorithm, generates attribute decision tree;
Attribute decision tree based on generation constructs source data ontology using OWL Ontology Language, and reading database literary name section is made For the attribute of class in ontology;
Attribute decision tree based on generation constructs source data ontology class using OWL Ontology Language, reads in decision tree and respectively select Branch is selected as attribute, constructs final decision class.
Optionally, the preset algorithm is ID3 algorithm.
Optionally, described that the source data attribute list is trained using preset algorithm, attribute decision tree is generated, specifically Include:
The comentropy of each field in the comentropy and attribute list of attribute list to be sorted is calculated, attribute is obtained Table comentropy and field information entropy, and information gain is calculated based on obtained attribute list comentropy and field information entropy;
The corresponding field of maximum information entropy production in the information gain of calculating is determined as attribute to be sorted, and will be divided The identical set of generic attribute is closed, and is gathered as identical subsample;Wherein, discrete groups of data is combined into set a little, it will even Continuous data group is combined into data interval;
It is newly-built for category attribute if the single-phase value performance of a certain category attribute is outstanding in the set of subsample Branch and as leaf node, and calculate the attribute percentage of category attribute and record corresponding judging result, And return to calling node;Otherwise, algorithm is re-called.
Database body learning optimization method provided in an embodiment of the present invention based on decision tree, select decision Tree algorithms or Its innovatory algorithm carries out mining analysis to pertinent arts model, to obtain different classifying rules, the present invention individually will Classifying rules saves as Ontological concept, using decision tree interior joint as Noumenon property, decision tree is specifically met to rule and is stored as The entity of bool type is equivalent to matrix storage, and improved ontology translation method includes more Rule Informations, can be neck Domain decision provides foundation, finally uses prot é g é tool, the ontology visual representing that building is completed, this body surface after building Danone power is strong, and time saving and energy saving.
Detailed description of the invention
Fig. 1 is that the process of the database body learning optimization method provided in an embodiment of the present invention based on decision tree is illustrated Figure;
Fig. 2 is the source data attribute list schematic diagram that the embodiment of the present invention uses;
Fig. 3 is the decision tree schematic diagram obtained using attribute list shown in Fig. 2 training;
Fig. 4 is the ontology class constructed using OWL Ontology Language to decision tree shown in Fig. 3 and attribute list schematic diagram;
Fig. 5 utilizes the regular schematic diagram of death that OWL Ontology Language constructs decision tree shown in Fig. 3.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool Body embodiment is described in detail.
It is exactly language correlation that ontology, which has a characteristic, with this characteristic, is built among ontology document and relational database Vertical mapping relations can realize.By the inspiration of the feature, the present invention is directed to ontology and determine by relational database Plan tree algorithm combines, and proposes a kind of database body learning optimization method based on decision tree.Example: from structural data Field a, b, c suitable for ontological construction are found, is found by decision tree shaped like a, the relationship of b → c, then this relationship is indicated It into knowledge base, on the one hand can save the time of expert, on the other hand even can find related fields expert and also be difficult to send out Existing data relationship.
As shown in Figure 1, the database body learning optimization method provided in an embodiment of the present invention based on decision tree include with Lower step:
S101, source data attribute list to be sorted is determined
In this step, source data attribute list to be sorted can be chosen by existing database, such as by certainly Database processed is chosen.
S102, the source data attribute list is trained using preset algorithm, generates attribute decision tree.
In this step, the preset algorithm can be ID3 algorithm.The step may particularly include:
Step 1: the comentropy of each field in the comentropy and attribute list of attribute list to be sorted is calculated, Attribute list comentropy and field information entropy are obtained, and comentropy is calculated based on obtained attribute list comentropy and field information entropy and is increased Benefit.
The comentropy I of attribute list can be calculated by following formula:
Wherein, D is entire attribute list, | D | it is the length of attribute list, | Dj| it is the length of j-th of attribute, Info (Dj) be The comentropy of j-th of attribute;Pi is the correspondence probability of i-th of value.
The comentropy K of each field can be calculated by the following formula:
Information gain is equal to the difference of the comentropy K of each field and the comentropy I of attribute list, that is, is equal to K-I.
Step 2: the corresponding field of maximum information entropy production in the information gain of calculating is determined as category to be sorted Property, and the identical set of categorical attribute is closed, gather as identical subsample;Wherein, discrete groups of data is combined into a little Set, is combined into data interval for continuous data set.
Step 3: in the set of subsample, if the single-phase value performance of a certain category attribute is outstanding, for category category Property newly-built branch and as leaf node, and calculate attribute percentage and the corresponding judgement of record of category attribute As a result, and returning to calling node;Otherwise, algorithm is re-called.
In this step, show outstanding standard be it is very big compared to the result accounting of remaining attribute, specifically taken to outstanding Value is not particularly limited, and can be determined according to the actual situation.
S103, the attribute decision tree based on generation construct source data ontology, reading database table using OWL Ontology Language Attribute of the field as class in ontology.
In this step, it prot é g é Visual Ontology can be used to construct software, be loaded into ID3 algorithm used above and generate Decision tree, construct ontology automatically.
S104, the attribute decision tree based on generation construct source data ontology class using OWL Ontology Language, read decision tree In respectively select branch as attribute, construct final decision class.
In this step, it prot é g é Visual Ontology can be used to construct software, be loaded into ID3 algorithm used above and generate Decision tree, construct ontology automatically.
[embodiment]
The first step chooses self-control database perils of the sea personnel's attribute list as data source perils of the sea personnel's attribute list, such as Fig. 2 institute Show.
Second step selects ID3 algorithm training data source to generate perils of the sea personnel attribute decision tree, the decision tree of building such as Fig. 3 It is shown.
Third step constructs wrecked passenger's ontology using OWL Ontology Language, and reading database literary name section is as class in ontology Attribute, as shown in Figure 4.
4th step constructs wrecked passenger's class using OWL Ontology Language, reads in decision tree and branch is respectively selected to construct as attribute Dead Regularia.Probability of death after specific rules and satisfaction rule is stored as example, as shown in Figure 5.
To sum up, the database body learning optimization method provided in an embodiment of the present invention based on decision tree selects decision tree Algorithm or its innovatory algorithm carry out mining analysis to pertinent arts model, thus obtain different classifying rules, the present invention Classifying rules is individually saved as into Ontological concept, using decision tree interior joint as Noumenon property, decision tree is specifically met to rule and is deposited Storage is the entity of bool type, is equivalent to matrix storage, what improved ontology translation method and database study directly generated Ontology comparison, the decision tree generated by database can add more constraint rules for ontology, and the hiding rule of discovery can be with As the filling of existing ontology, foundation can be provided for field decision, finally use prot é g é tool, the sheet that building is completed Volume visualization performance can allow machine to replace saving a large amount of human costs by hand, so that the ontology ability to express after building By force, time saving and energy saving.
Embodiment described above, only a specific embodiment of the invention, to illustrate technical solution of the present invention, rather than It is limited, scope of protection of the present invention is not limited thereto, although having carried out with reference to the foregoing embodiments to the present invention detailed Illustrate, those skilled in the art should understand that: anyone skilled in the art the invention discloses In technical scope, it can still modify to technical solution documented by previous embodiment or variation can be readily occurred in, or Person's equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make corresponding technical solution Essence is detached from the spirit and scope of technical solution of the embodiment of the present invention, should be covered by the protection scope of the present invention.Therefore, The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (3)

1. a kind of database body learning optimization method based on decision tree, which comprises the following steps:
Determine source data attribute list to be sorted;
The source data attribute list is trained using preset algorithm, generates attribute decision tree;
Attribute decision tree based on generation constructs source data ontology using OWL Ontology Language, and reading database literary name section is as this The attribute of class in body;
Attribute decision tree based on generation constructs source data ontology class using OWL Ontology Language, reads in decision tree and respectively select branch As attribute, final decision class is constructed.
2. the method according to claim 1, wherein the preset algorithm is ID3 algorithm.
3. method according to claim 1 or 2, which is characterized in that described to utilize preset algorithm to the source data attribute Table is trained, and is generated attribute decision tree, is specifically included:
The comentropy of each field in the comentropy and attribute list of attribute list to be sorted is calculated, attribute list letter is obtained Entropy and field information entropy are ceased, and information gain is calculated based on obtained attribute list comentropy and field information entropy;
The corresponding field of maximum information entropy production in the information gain of calculating is determined as attribute to be sorted, and classification is belonged to Property identical set closed, gather as identical subsample;Wherein, discrete groups of data is combined into set a little, by consecutive numbers Data interval is combined into according to group;
In the set of subsample, if the single-phase value performance of a certain category attribute is outstanding, branch is created for category attribute And as leaf node, and calculates the attribute percentage of category attribute and record corresponding judging result, and It returns and calls node;Otherwise, algorithm is re-called.
CN201910588441.0A 2019-07-01 2019-07-01 A kind of database body learning optimization method based on decision tree Pending CN110377754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910588441.0A CN110377754A (en) 2019-07-01 2019-07-01 A kind of database body learning optimization method based on decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910588441.0A CN110377754A (en) 2019-07-01 2019-07-01 A kind of database body learning optimization method based on decision tree

Publications (1)

Publication Number Publication Date
CN110377754A true CN110377754A (en) 2019-10-25

Family

ID=68251635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910588441.0A Pending CN110377754A (en) 2019-07-01 2019-07-01 A kind of database body learning optimization method based on decision tree

Country Status (1)

Country Link
CN (1) CN110377754A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104466A (en) * 2019-12-25 2020-05-05 航天科工网络信息发展有限公司 Method for rapidly classifying massive database tables

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404674A (en) * 2015-11-20 2016-03-16 焦点科技股份有限公司 Knowledge-dependent webpage information extraction method
CN106096647A (en) * 2016-06-08 2016-11-09 哈尔滨工程大学 A kind of RLID3 data classification method based on decision tree optimization rate
CN107194468A (en) * 2017-04-19 2017-09-22 哈尔滨工程大学 Towards the decision tree Increment Learning Algorithm of information big data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404674A (en) * 2015-11-20 2016-03-16 焦点科技股份有限公司 Knowledge-dependent webpage information extraction method
CN106096647A (en) * 2016-06-08 2016-11-09 哈尔滨工程大学 A kind of RLID3 data classification method based on decision tree optimization rate
CN107194468A (en) * 2017-04-19 2017-09-22 哈尔滨工程大学 Towards the decision tree Increment Learning Algorithm of information big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁嘉伟 等: "一种基于决策树的数据库本体学习优化方法", 《电视技术》, vol. 43, no. 4, pages 6 - 10 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104466A (en) * 2019-12-25 2020-05-05 航天科工网络信息发展有限公司 Method for rapidly classifying massive database tables
CN111104466B (en) * 2019-12-25 2023-07-28 中国长峰机电技术研究设计院 Method for quickly classifying massive database tables

Similar Documents

Publication Publication Date Title
Liu et al. A consensus model for hesitant fuzzy linguistic group decision-making in the framework of Dempster–Shafer evidence theory
CN111914096B (en) Public opinion knowledge graph-based public transportation passenger satisfaction evaluation method and system
CN111127385B (en) Medical information cross-modal Hash coding learning method based on generative countermeasure network
Rotshtein et al. Fuzzy evidence in identification, forecasting and diagnosis
CN110457442A (en) The knowledge mapping construction method of smart grid-oriented customer service question and answer
CN106897568A (en) The treating method and apparatus of case history structuring
Ni Research of data mining based on neural networks
CN106779087A (en) A kind of general-purpose machinery learning data analysis platform
Liu et al. The development of fuzzy rough sets with the use of structures and algebras of axiomatic fuzzy sets
Zhou et al. Risk evaluation of dynamic alliance based on fuzzy analytic network process and fuzzy TOPSIS
CN114999610A (en) Deep learning-based emotion perception and support dialog system construction method
CN111008215B (en) Expert recommendation method combining label construction and community relation avoidance
CN110377754A (en) A kind of database body learning optimization method based on decision tree
Naz et al. A new approach to sentiment analysis algorithms: Extended SWARA-MABAC method with 2-tuple linguistic q-rung orthopair fuzzy information
Ni et al. Rapid generation of emergency response plans for unconventional emergencies
Boongoen et al. Fuzzy qualitative link analysis for academic performance evaluation
Xing et al. Rapid development of knowledge-based systems via integrated knowledge acquisition
Hassan et al. Rough neural classifier system
Martin et al. An analysis on qualitative bankruptcy prediction using fuzzy ID3 and ant colony optimization algorithm
CN113254468A (en) Fault query and reasoning method for certain type of equipment
Rasmani et al. Subsethood-based fuzzy rule models and their application to student performance classification
CN111694952A (en) Big data analysis model system based on microblog and implementation method thereof
Singh et al. Adaptive genetic programming based linkage rule miner for entity linking in Semantic Web
Hou et al. A study of intelligent decision-making system based on neural networks and expert system
CN108549665A (en) A kind of text classification scheme of human-computer interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination