CN110377754A - A kind of database body learning optimization method based on decision tree - Google Patents
A kind of database body learning optimization method based on decision tree Download PDFInfo
- Publication number
- CN110377754A CN110377754A CN201910588441.0A CN201910588441A CN110377754A CN 110377754 A CN110377754 A CN 110377754A CN 201910588441 A CN201910588441 A CN 201910588441A CN 110377754 A CN110377754 A CN 110377754A
- Authority
- CN
- China
- Prior art keywords
- attribute
- decision tree
- ontology
- source data
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of database body learning optimization method based on decision tree, comprising the following steps: determine source data attribute list to be sorted;The source data attribute list is trained using preset algorithm, generates attribute decision tree;Attribute decision tree based on generation constructs source data ontology, attribute of the reading database literary name section as class in ontology using OWL Ontology Language;Attribute decision tree based on generation constructs source data ontology class using OWL Ontology Language, reads and respectively selects branch as attribute in decision tree, constructs final decision class.Treated that data effectively can be indicated and be stored with ontology by decision Tree algorithms by the present invention, the ontology directly generated with database study compares, the decision tree generated by database can add more constraint rules for ontology, and the hiding rule of discovery can be used as the filling of existing ontology.In addition, machine can be allowed to replace saving a large amount of human costs by hand using body learning technology.
Description
Technical field
The present invention relates to a kind of database body learning optimization methods, and in particular to a kind of database sheet based on decision tree
Body learns optimization method.
Background technique
The purpose of ontology is to obtain the knowledge of related fields, provides and is commonly understood by domain knowledge, determines in the field
Generally accepted vocabulary, and provide from different formalized models the relationship between vocabulary and vocabulary.The building of ontology
Extension and reasoning are all based on open world knowledge, i.e. the logic not with objective world is violated, but can be converted to mathematics again simultaneously
Semantic relation between symbolic reasoning concept is convenient for computer understanding.
However, many irresistible difficulties allow it to become incomparable difficult when cross-cutting building ontology.Such as: the shape of ontology
Formula description is a kind of abstract method, and it embodies mode and relatively obscures, this has resulted in being difficult to qualitative features
The case where.This considerably increases by the difficulty of bulk information entitative concept.And the theoretical basis of ontological construction is patrolled by description
Volume determining, it describes the inference mechanism of generalities between things, and prove with the mode of mathematics the feasibility, just of reasoning
True property, this provides guarantee to the scalability of ontological construction, but increases the difficulty of ontological construction, non-computer at the same time
Domain expert is difficult to determine the specific method of description logic when constructing itself related fields ontology, it is difficult to accomplish to use calculating mechanism
The language in itself field of the language expression of solution, and field correlation can not be determined when computer field expert building other field ontology
Knowledge, it is even more impossible to further generalities ontologies, so that generally require domain expert determines domain knowledge by hand first, determine field
Concept, conceptual relation, then by computer field expert by knowledge description be ontology, time-consuming and laborious, cost is excessively high.
It would therefore be highly desirable to need to provide a kind of body constructing method that ontology ability to express is strong and time saving and energy saving.
Summary of the invention
In view of the above technical problems, the present invention is intended to provide a kind of database body learning optimization side based on decision tree
Method, the ontology ability to express after this method optimization are strong and time saving and energy saving.
The technical solution adopted by the present invention are as follows:
The embodiment of the present invention provides a kind of database body learning optimization method based on decision tree, comprising the following steps:
Determine source data attribute list to be sorted;
The source data attribute list is trained using preset algorithm, generates attribute decision tree;
Attribute decision tree based on generation constructs source data ontology using OWL Ontology Language, and reading database literary name section is made
For the attribute of class in ontology;
Attribute decision tree based on generation constructs source data ontology class using OWL Ontology Language, reads in decision tree and respectively select
Branch is selected as attribute, constructs final decision class.
Optionally, the preset algorithm is ID3 algorithm.
Optionally, described that the source data attribute list is trained using preset algorithm, attribute decision tree is generated, specifically
Include:
The comentropy of each field in the comentropy and attribute list of attribute list to be sorted is calculated, attribute is obtained
Table comentropy and field information entropy, and information gain is calculated based on obtained attribute list comentropy and field information entropy;
The corresponding field of maximum information entropy production in the information gain of calculating is determined as attribute to be sorted, and will be divided
The identical set of generic attribute is closed, and is gathered as identical subsample;Wherein, discrete groups of data is combined into set a little, it will even
Continuous data group is combined into data interval;
It is newly-built for category attribute if the single-phase value performance of a certain category attribute is outstanding in the set of subsample
Branch and as leaf node, and calculate the attribute percentage of category attribute and record corresponding judging result,
And return to calling node;Otherwise, algorithm is re-called.
Database body learning optimization method provided in an embodiment of the present invention based on decision tree, select decision Tree algorithms or
Its innovatory algorithm carries out mining analysis to pertinent arts model, to obtain different classifying rules, the present invention individually will
Classifying rules saves as Ontological concept, using decision tree interior joint as Noumenon property, decision tree is specifically met to rule and is stored as
The entity of bool type is equivalent to matrix storage, and improved ontology translation method includes more Rule Informations, can be neck
Domain decision provides foundation, finally uses prot é g é tool, the ontology visual representing that building is completed, this body surface after building
Danone power is strong, and time saving and energy saving.
Detailed description of the invention
Fig. 1 is that the process of the database body learning optimization method provided in an embodiment of the present invention based on decision tree is illustrated
Figure;
Fig. 2 is the source data attribute list schematic diagram that the embodiment of the present invention uses;
Fig. 3 is the decision tree schematic diagram obtained using attribute list shown in Fig. 2 training;
Fig. 4 is the ontology class constructed using OWL Ontology Language to decision tree shown in Fig. 3 and attribute list schematic diagram;
Fig. 5 utilizes the regular schematic diagram of death that OWL Ontology Language constructs decision tree shown in Fig. 3.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool
Body embodiment is described in detail.
It is exactly language correlation that ontology, which has a characteristic, with this characteristic, is built among ontology document and relational database
Vertical mapping relations can realize.By the inspiration of the feature, the present invention is directed to ontology and determine by relational database
Plan tree algorithm combines, and proposes a kind of database body learning optimization method based on decision tree.Example: from structural data
Field a, b, c suitable for ontological construction are found, is found by decision tree shaped like a, the relationship of b → c, then this relationship is indicated
It into knowledge base, on the one hand can save the time of expert, on the other hand even can find related fields expert and also be difficult to send out
Existing data relationship.
As shown in Figure 1, the database body learning optimization method provided in an embodiment of the present invention based on decision tree include with
Lower step:
S101, source data attribute list to be sorted is determined。
In this step, source data attribute list to be sorted can be chosen by existing database, such as by certainly
Database processed is chosen.
S102, the source data attribute list is trained using preset algorithm, generates attribute decision tree.
In this step, the preset algorithm can be ID3 algorithm.The step may particularly include:
Step 1: the comentropy of each field in the comentropy and attribute list of attribute list to be sorted is calculated,
Attribute list comentropy and field information entropy are obtained, and comentropy is calculated based on obtained attribute list comentropy and field information entropy and is increased
Benefit.
The comentropy I of attribute list can be calculated by following formula:
Wherein, D is entire attribute list, | D | it is the length of attribute list, | Dj| it is the length of j-th of attribute, Info (Dj) be
The comentropy of j-th of attribute;Pi is the correspondence probability of i-th of value.
The comentropy K of each field can be calculated by the following formula:
Information gain is equal to the difference of the comentropy K of each field and the comentropy I of attribute list, that is, is equal to K-I.
Step 2: the corresponding field of maximum information entropy production in the information gain of calculating is determined as category to be sorted
Property, and the identical set of categorical attribute is closed, gather as identical subsample;Wherein, discrete groups of data is combined into a little
Set, is combined into data interval for continuous data set.
Step 3: in the set of subsample, if the single-phase value performance of a certain category attribute is outstanding, for category category
Property newly-built branch and as leaf node, and calculate attribute percentage and the corresponding judgement of record of category attribute
As a result, and returning to calling node;Otherwise, algorithm is re-called.
In this step, show outstanding standard be it is very big compared to the result accounting of remaining attribute, specifically taken to outstanding
Value is not particularly limited, and can be determined according to the actual situation.
S103, the attribute decision tree based on generation construct source data ontology, reading database table using OWL Ontology Language
Attribute of the field as class in ontology.
In this step, it prot é g é Visual Ontology can be used to construct software, be loaded into ID3 algorithm used above and generate
Decision tree, construct ontology automatically.
S104, the attribute decision tree based on generation construct source data ontology class using OWL Ontology Language, read decision tree
In respectively select branch as attribute, construct final decision class.
In this step, it prot é g é Visual Ontology can be used to construct software, be loaded into ID3 algorithm used above and generate
Decision tree, construct ontology automatically.
[embodiment]
The first step chooses self-control database perils of the sea personnel's attribute list as data source perils of the sea personnel's attribute list, such as Fig. 2 institute
Show.
Second step selects ID3 algorithm training data source to generate perils of the sea personnel attribute decision tree, the decision tree of building such as Fig. 3
It is shown.
Third step constructs wrecked passenger's ontology using OWL Ontology Language, and reading database literary name section is as class in ontology
Attribute, as shown in Figure 4.
4th step constructs wrecked passenger's class using OWL Ontology Language, reads in decision tree and branch is respectively selected to construct as attribute
Dead Regularia.Probability of death after specific rules and satisfaction rule is stored as example, as shown in Figure 5.
To sum up, the database body learning optimization method provided in an embodiment of the present invention based on decision tree selects decision tree
Algorithm or its innovatory algorithm carry out mining analysis to pertinent arts model, thus obtain different classifying rules, the present invention
Classifying rules is individually saved as into Ontological concept, using decision tree interior joint as Noumenon property, decision tree is specifically met to rule and is deposited
Storage is the entity of bool type, is equivalent to matrix storage, what improved ontology translation method and database study directly generated
Ontology comparison, the decision tree generated by database can add more constraint rules for ontology, and the hiding rule of discovery can be with
As the filling of existing ontology, foundation can be provided for field decision, finally use prot é g é tool, the sheet that building is completed
Volume visualization performance can allow machine to replace saving a large amount of human costs by hand, so that the ontology ability to express after building
By force, time saving and energy saving.
Embodiment described above, only a specific embodiment of the invention, to illustrate technical solution of the present invention, rather than
It is limited, scope of protection of the present invention is not limited thereto, although having carried out with reference to the foregoing embodiments to the present invention detailed
Illustrate, those skilled in the art should understand that: anyone skilled in the art the invention discloses
In technical scope, it can still modify to technical solution documented by previous embodiment or variation can be readily occurred in, or
Person's equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make corresponding technical solution
Essence is detached from the spirit and scope of technical solution of the embodiment of the present invention, should be covered by the protection scope of the present invention.Therefore,
The protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (3)
1. a kind of database body learning optimization method based on decision tree, which comprises the following steps:
Determine source data attribute list to be sorted;
The source data attribute list is trained using preset algorithm, generates attribute decision tree;
Attribute decision tree based on generation constructs source data ontology using OWL Ontology Language, and reading database literary name section is as this
The attribute of class in body;
Attribute decision tree based on generation constructs source data ontology class using OWL Ontology Language, reads in decision tree and respectively select branch
As attribute, final decision class is constructed.
2. the method according to claim 1, wherein the preset algorithm is ID3 algorithm.
3. method according to claim 1 or 2, which is characterized in that described to utilize preset algorithm to the source data attribute
Table is trained, and is generated attribute decision tree, is specifically included:
The comentropy of each field in the comentropy and attribute list of attribute list to be sorted is calculated, attribute list letter is obtained
Entropy and field information entropy are ceased, and information gain is calculated based on obtained attribute list comentropy and field information entropy;
The corresponding field of maximum information entropy production in the information gain of calculating is determined as attribute to be sorted, and classification is belonged to
Property identical set closed, gather as identical subsample;Wherein, discrete groups of data is combined into set a little, by consecutive numbers
Data interval is combined into according to group;
In the set of subsample, if the single-phase value performance of a certain category attribute is outstanding, branch is created for category attribute
And as leaf node, and calculates the attribute percentage of category attribute and record corresponding judging result, and
It returns and calls node;Otherwise, algorithm is re-called.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910588441.0A CN110377754A (en) | 2019-07-01 | 2019-07-01 | A kind of database body learning optimization method based on decision tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910588441.0A CN110377754A (en) | 2019-07-01 | 2019-07-01 | A kind of database body learning optimization method based on decision tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110377754A true CN110377754A (en) | 2019-10-25 |
Family
ID=68251635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910588441.0A Pending CN110377754A (en) | 2019-07-01 | 2019-07-01 | A kind of database body learning optimization method based on decision tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110377754A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104466A (en) * | 2019-12-25 | 2020-05-05 | 航天科工网络信息发展有限公司 | Method for rapidly classifying massive database tables |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105404674A (en) * | 2015-11-20 | 2016-03-16 | 焦点科技股份有限公司 | Knowledge-dependent webpage information extraction method |
CN106096647A (en) * | 2016-06-08 | 2016-11-09 | 哈尔滨工程大学 | A kind of RLID3 data classification method based on decision tree optimization rate |
CN107194468A (en) * | 2017-04-19 | 2017-09-22 | 哈尔滨工程大学 | Towards the decision tree Increment Learning Algorithm of information big data |
-
2019
- 2019-07-01 CN CN201910588441.0A patent/CN110377754A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105404674A (en) * | 2015-11-20 | 2016-03-16 | 焦点科技股份有限公司 | Knowledge-dependent webpage information extraction method |
CN106096647A (en) * | 2016-06-08 | 2016-11-09 | 哈尔滨工程大学 | A kind of RLID3 data classification method based on decision tree optimization rate |
CN107194468A (en) * | 2017-04-19 | 2017-09-22 | 哈尔滨工程大学 | Towards the decision tree Increment Learning Algorithm of information big data |
Non-Patent Citations (1)
Title |
---|
丁嘉伟 等: "一种基于决策树的数据库本体学习优化方法", 《电视技术》, vol. 43, no. 4, pages 6 - 10 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104466A (en) * | 2019-12-25 | 2020-05-05 | 航天科工网络信息发展有限公司 | Method for rapidly classifying massive database tables |
CN111104466B (en) * | 2019-12-25 | 2023-07-28 | 中国长峰机电技术研究设计院 | Method for quickly classifying massive database tables |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | A consensus model for hesitant fuzzy linguistic group decision-making in the framework of Dempster–Shafer evidence theory | |
CN111914096B (en) | Public opinion knowledge graph-based public transportation passenger satisfaction evaluation method and system | |
CN111127385B (en) | Medical information cross-modal Hash coding learning method based on generative countermeasure network | |
Rotshtein et al. | Fuzzy evidence in identification, forecasting and diagnosis | |
CN110457442A (en) | The knowledge mapping construction method of smart grid-oriented customer service question and answer | |
CN106897568A (en) | The treating method and apparatus of case history structuring | |
Ni | Research of data mining based on neural networks | |
CN106779087A (en) | A kind of general-purpose machinery learning data analysis platform | |
Liu et al. | The development of fuzzy rough sets with the use of structures and algebras of axiomatic fuzzy sets | |
Zhou et al. | Risk evaluation of dynamic alliance based on fuzzy analytic network process and fuzzy TOPSIS | |
CN114999610A (en) | Deep learning-based emotion perception and support dialog system construction method | |
CN111008215B (en) | Expert recommendation method combining label construction and community relation avoidance | |
CN110377754A (en) | A kind of database body learning optimization method based on decision tree | |
Naz et al. | A new approach to sentiment analysis algorithms: Extended SWARA-MABAC method with 2-tuple linguistic q-rung orthopair fuzzy information | |
Ni et al. | Rapid generation of emergency response plans for unconventional emergencies | |
Boongoen et al. | Fuzzy qualitative link analysis for academic performance evaluation | |
Xing et al. | Rapid development of knowledge-based systems via integrated knowledge acquisition | |
Hassan et al. | Rough neural classifier system | |
Martin et al. | An analysis on qualitative bankruptcy prediction using fuzzy ID3 and ant colony optimization algorithm | |
CN113254468A (en) | Fault query and reasoning method for certain type of equipment | |
Rasmani et al. | Subsethood-based fuzzy rule models and their application to student performance classification | |
CN111694952A (en) | Big data analysis model system based on microblog and implementation method thereof | |
Singh et al. | Adaptive genetic programming based linkage rule miner for entity linking in Semantic Web | |
Hou et al. | A study of intelligent decision-making system based on neural networks and expert system | |
CN108549665A (en) | A kind of text classification scheme of human-computer interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |